| 
					
				 | 
			
			
				@@ -241,7 +241,7 @@ After supervised fine-tuning, RLHF is a step used to align the LLM's answers wit 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 * [Illustration RLHF](https://huggingface.co/blog/rlhf) by Hugging Face: Introduction to RLHF with reward model training and fine-tuning with reinforcement learning. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 * [Preference Tuning LLMs](https://huggingface.co/blog/pref-tuning) by Hugging Face: Comparison of the DPO, IPO, and KTO algorithms to perform preference alignment. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 * [LLM Training: RLHF and Its Alternatives](https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives) by Sebastian Rashcka: Overview of the RLHF process and alternatives like RLAIF. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-* [Fine-tune Mistral-7b with DPO](https://huggingface.co/blog/dpo-trl): Tutorial to fine-tune a Mistral-7b model with DPO and reproduce [NeuralHermes-2.5](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+* [Fine-tune Mistral-7b with DPO](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html): Tutorial to fine-tune a Mistral-7b model with DPO and reproduce [NeuralHermes-2.5](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 --- 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ### 6. Evaluation 
			 |