2 年之前 · fae0ad2681
--- a/Fine_tune_a_Mistral_7b_model_with_DPO.ipynb
+++ b/Fine_tune_a_Mistral_7b_model_with_DPO.ipynb
--- a/README.md
+++ b/README.md
@@ -1,5 +1,7 @@
 
				 # 🗣️ Large Language Model Course
			
 
				 
			
 
				+<p align="center"><a href="https://twitter.com/maximelabonne">Follow me on X</a> • <a href="https://mlabonne.github.io/blog">Blog</a> • <a href="https://github.com/PacktPublishing/Hands-On-Graph-Neural-Networks-Using-Python">Hands-on GNN</a></p>
			
 
				+
			
 
				 The LLM course is divided into three parts:
			
 
				 
			
 
				 1. 🧩 **LLM Fundamentals** covers essential knowledge about mathematics, Python, and neural networks.
			
@@ -144,13 +146,14 @@ While it's easy to find raw data from Wikipedia and other websites, it's difficu
 
				 Pre-training is a very long and costly process, which is why this is not the focus of this course. It's good to have some level of understanding of what happens during pre-training, but hands-on experience is not required.
			
 
				 
			
 
				 * **Data pipeline**: Pre-training requires huge datasets (e.g., [Llama 2](https://arxiv.org/abs/2307.09288) was trained on 2 trillion tokens) that need to be filtered, tokenized, and collated with a pre-defined vocabulary.
			
 
				-* **Causal language modeling**: Learn the difference between causal and masked language modeling, as well as the loss function used in this case.
			
 
				+* **Causal language modeling**: Learn the difference between causal and masked language modeling, as well as the loss function used in this case. For efficient pre-training, learn more about [Megatron-LM](https://github.com/NVIDIA/Megatron-LM).
			
 
				 * **Scaling laws**: The [scaling laws](https://arxiv.org/pdf/2001.08361.pdf) describe the expected model performance based on the model size, dataset size, and the amount of compute used for training.
			
 
				 * **High-Performance Computing**: Out of scope here, but more knowledge about HPC is fundamental if you're planning to create your own LLM from scratch (hardware, distributed workload, etc.).
			
 
				 
			
 
				 📚 **References**:
			
 
				 * [LLMDataHub](https://github.com/Zjh-819/LLMDataHub) by Junhao Zhao: Curated list of datasets for pre-training, fine-tuning, and RLHF.
			
 
				 * [Training a causal language model from scratch](https://huggingface.co/learn/nlp-course/chapter7/6?fw=pt) by Hugging Face: Pre-train a GPT-2 model from scratch using the transformers library.
			
 
				+* [Megatron-LM](https://github.com/NVIDIA/Megatron-LM): State-of-the-art library to efficiently pre-train models.
			
 
				 * [TinyLlama](https://github.com/jzhang38/TinyLlama) by Zhang et al.: Check this project to get a good understanding of how a Llama model is trained from scratch.
			
 
				 * [Causal language modeling](https://huggingface.co/docs/transformers/tasks/language_modeling) by Hugging Face: Explain the difference between causal and masked language modeling and how to quickly fine-tune a DistilGPT-2 model.
			
 
				 * [Chinchilla's wild implications](https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications) by nostalgebraist: Discuss the scaling laws and explain what they mean to LLMs in general.