2 years ago · 13f2734e25
--- a/docs/LLM_finetuning.md
+++ b/docs/LLM_finetuning.md
@@ -1,6 +1,6 @@
 
				 ## LLM Fine-Tuning
			
 
				 
			
 
				-Here we discuss fine-tuning Llama 2 with a couple of different recipes. We will cover two scenarios here:
			
 
				+Here we discuss fine-tuning Meta Llama 3 with a couple of different recipes. We will cover two scenarios here:
			
 
				 
			
 
				 
			
 
				 ## 1. **Parameter Efficient Model Fine-Tuning**
			
@@ -42,7 +42,7 @@ You can also keep most of the layers frozen and only fine-tune a few layers. The
 
				 
			
 
				 
			
 
				 
			
 
				-In this scenario depending on the model size, you might need to go beyond one GPU, especially if your model does not fit into one GPU for training. In this case Llama 2 7B parameter won't fit into one gpu.
			
 
				+In this scenario depending on the model size, you might need to go beyond one GPU, especially if your model does not fit into one GPU for training. In this case Meta Llama 3 8B parameter won't fit into one gpu.
			
 
				 The way you want to think about it is, you would need enough GPU memory to keep model parameters, gradients and optimizer states. Where each of these, depending on the precision you are training, can take up multiple times of your parameter count x precision( depending on if its fp32/ 4 bytes, fp16/2 bytes/ bf16/2 bytes).
			
 
				 For example AdamW optimizer keeps 2 parameters for each of your parameters and in many cases these are kept in fp32. This implies that depending on how many layers you are training/ unfreezing your GPU memory can grow beyond one GPU.
			
 
				 
			
--- a/docs/multi_gpu.md
+++ b/docs/multi_gpu.md
@@ -6,9 +6,9 @@ To run fine-tuning on multi-GPUs, we will  make use of two packages:
 
				 
			
 
				 2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](LLM_finetuning.md/#2-full-partial-parameter-finetuning).
			
 
				 
			
 
				-Given the combination of PEFT and FSDP, we would be able to fine tune a Llama 2 model on multiple GPUs in one node or multi-node.
			
 
				+Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 3 8B model on multiple GPUs in one node or multi-node.
			
 
				 
			
 
				-## Requirements 
			
 
				+## Requirements
			
 
				 To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../recipes/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
			
 
				 
			
 
				 **Please note that the llama_recipes package will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.**
			
@@ -24,7 +24,7 @@ This runs with the `samsum_dataset` for summarization application by default.
 
				 
			
 
				 ```bash
			
 
				 
			
 
				-torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model
			
 
				+torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/8B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 ```
			
 
				 
			
@@ -43,7 +43,7 @@ We use `torchrun` here to spawn multiple processes for FSDP.
 
				 Setting `use_fast_kernels` will enable using of Flash Attention or Xformer memory-efficient kernels based on the hardware being used. This would speed up the fine-tuning job. This has been enabled in `optimum` library from HuggingFace as a one-liner API, please read more [here](https://pytorch.org/blog/out-of-the-box-acceleration/).
			
 
				 
			
 
				 ```bash
			
 
				-torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model --use_fast_kernels
			
 
				+torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/8B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model --use_fast_kernels
			
 
				 ```
			
 
				 
			
 
				 ### Fine-tuning using FSDP Only
			
@@ -52,7 +52,7 @@ If interested in running full parameter finetuning without making use of PEFT me
 
				 
			
 
				 ```bash
			
 
				 
			
 
				-torchrun --nnodes 1 --nproc_per_node 8  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --use_fast_kernels
			
 
				+torchrun --nnodes 1 --nproc_per_node 8  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/8B --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --use_fast_kernels
			
 
				 
			
 
				 ```
			
 
				 
			
@@ -95,16 +95,16 @@ To run with each of the datasets set the `dataset` flag in the command as shown
 
				 
			
 
				 ```bash
			
 
				 # grammer_dataset
			
 
				-torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset grammar_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned  --pure_bf16 --output_dir Path/to/save/PEFT/model
			
 
				+torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/8B --use_peft --peft_method lora --dataset grammar_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned  --pure_bf16 --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 # alpaca_dataset
			
 
				 
			
 
				-torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset alpaca_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model
			
 
				+torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/8B --use_peft --peft_method lora --dataset alpaca_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 
			
 
				 # samsum_dataset
			
 
				 
			
 
				-torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset samsum_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model
			
 
				+torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --model_name /patht_of_model_folder/8B --use_peft --peft_method lora --dataset samsum_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 ```
			
 
				 
			
@@ -116,7 +116,7 @@ It lets us specify the training settings for everything from `model_name` to `da
 
				 
			
 
				 ```python
			
 
				 
			
 
				-model_name: str="PATH/to/LLAMA 2/7B"
			
 
				+model_name: str="PATH/to/Model"
			
 
				 enable_fsdp: bool= False
			
 
				 run_validation: bool=True
			
 
				 batch_size_training: int=4
			
--- a/docs/single_gpu.md
+++ b/docs/single_gpu.md
@@ -6,9 +6,9 @@ To run fine-tuning on a single GPU, we will  make use of two packages
 
				 
			
 
				 2- [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) int8 quantization.
			
 
				 
			
 
				-Given combination of PEFT and Int8 quantization, we would be able to fine_tune a Llama 2 7B model on one consumer grade GPU such as A10.
			
 
				+Given combination of PEFT and Int8 quantization, we would be able to fine_tune a Meta Llama 3 8B model on one consumer grade GPU such as A10.
			
 
				 
			
 
				-## Requirements 
			
 
				+## Requirements
			
 
				 To run the examples, make sure to install the llama-recipes package (See [README.md](../README.md) for details).
			
 
				 
			
 
				 **Please note that the llama-recipes package will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.**
			
@@ -20,7 +20,7 @@ Get access to a machine with one GPU or if using a multi-GPU machine please make
 
				 
			
 
				 ```bash
			
 
				 
			
 
				-python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization --use_fp16 --model_name /patht_of_model_folder/7B --output_dir Path/to/save/PEFT/model
			
 
				+python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization --use_fp16 --model_name /patht_of_model_folder/8B --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 ```
			
 
				 The args used in the command above are:
			
@@ -51,16 +51,16 @@ to run with each of the datasets set the `dataset` flag in the command as shown
 
				 ```bash
			
 
				 # grammer_dataset
			
 
				 
			
 
				-python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization  --dataset grammar_dataset --model_name /patht_of_model_folder/7B --output_dir Path/to/save/PEFT/model
			
 
				+python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization  --dataset grammar_dataset --model_name /patht_of_model_folder/8B --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 # alpaca_dataset
			
 
				 
			
 
				-python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization  --dataset alpaca_dataset --model_name /patht_of_model_folder/7B --output_dir Path/to/save/PEFT/model
			
 
				+python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization  --dataset alpaca_dataset --model_name /patht_of_model_folder/8B --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 
			
 
				 # samsum_dataset
			
 
				 
			
 
				-python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization  --dataset samsum_dataset --model_name /patht_of_model_folder/7B --output_dir Path/to/save/PEFT/model
			
 
				+python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization  --dataset samsum_dataset --model_name /patht_of_model_folder/8B --output_dir Path/to/save/PEFT/model
			
 
				 
			
 
				 ```
			
 
				 
			
@@ -72,7 +72,7 @@ It let us specify the training settings, everything from `model_name` to `datase
 
				 
			
 
				 ```python
			
 
				 
			
 
				-model_name: str="PATH/to/LLAMA 2/7B"
			
 
				+model_name: str="PATH/to/Model"
			
 
				 enable_fsdp: bool= False
			
 
				 run_validation: bool=True
			
 
				 batch_size_training: int=4
			
--- a/recipes/finetuning/LLM_finetuning_overview.md
+++ b/recipes/finetuning/LLM_finetuning_overview.md
@@ -1,6 +1,6 @@
 
				 ## LLM Fine-Tuning
			
 
				 
			
 
				-Here we discuss fine-tuning Llama 2 with a couple of different recipes. We will cover two scenarios here:
			
 
				+Here we discuss fine-tuning Meta Llama 3 with a couple of different recipes. We will cover two scenarios here:
			
 
				 
			
 
				 
			
 
				 ## 1. **Parameter Efficient Model Fine-Tuning**
			
@@ -42,7 +42,7 @@ You can also keep most of the layers frozen and only fine-tune a few layers. The
 
				 
			
 
				 
			
 
				 
			
 
				-In this scenario depending on the model size, you might need to go beyond one GPU, especially if your model does not fit into one GPU for training. In this case Llama 2 7B parameter won't fit into one gpu.
			
 
				+In this scenario depending on the model size, you might need to go beyond one GPU, especially if your model does not fit into one GPU for training. In this case Meta Llama 3 8B parameter won't fit into one gpu.
			
 
				 The way you want to think about it is, you would need enough GPU memory to keep model parameters, gradients and optimizer states. Where each of these, depending on the precision you are training, can take up multiple times of your parameter count x precision( depending on if its fp32/ 4 bytes, fp16/2 bytes/ bf16/2 bytes).
			
 
				 For example AdamW optimizer keeps 2 parameters for each of your parameters and in many cases these are kept in fp32. This implies that depending on how many layers you are training/ unfreezing your GPU memory can grow beyond one GPU.
			
 
				 
			
--- a/recipes/finetuning/README.md
+++ b/recipes/finetuning/README.md
@@ -1,15 +1,15 @@
 
				 # Finetuning Llama
			
 
				 
			
 
				-This folder contains instructions to fine-tune Llama 2 on a 
			
 
				+This folder contains instructions to fine-tune Meta Llama 3 on a
			
 
				 * [single-GPU setup](./singlegpu_finetuning.md)
			
 
				-* [multi-GPU setup](./multigpu_finetuning.md) 
			
 
				+* [multi-GPU setup](./multigpu_finetuning.md)
			
 
				 
			
 
				 using the canonical [finetuning script](../../src/llama_recipes/finetuning.py) in the llama-recipes package.
			
 
				 
			
 
				 If you are new to fine-tuning techniques, check out an overview: [](./LLM_finetuning_overview.md)
			
 
				 
			
 
				 > [!TIP]
			
 
				-> If you want to try finetuning Llama 2 with Huggingface's trainer, here is a Jupyter notebook with an [example](./huggingface_trainer/peft_finetuning.ipynb)
			
 
				+> If you want to try finetuning Meta Llama 3 with Huggingface's trainer, here is a Jupyter notebook with an [example](./huggingface_trainer/peft_finetuning.ipynb)
			
 
				 
			
 
				 
			
 
				 ## How to configure finetuning settings?
			
@@ -24,7 +24,7 @@ It lets us specify the training settings for everything from `model_name` to `da
 
				 
			
 
				 ```python
			
 
				 
			
 
				-model_name: str="PATH/to/LLAMA 2/7B"
			
 
				+model_name: str="PATH/to/Model"
			
 
				 enable_fsdp: bool= False
			
 
				 run_validation: bool=True
			
 
				 batch_size_training: int=4
			
@@ -82,9 +82,9 @@ save_optimizer: bool=False
 
				 You can enable [W&B](https://wandb.ai/) experiment tracking by using `use_wandb` flag as below. You can change the project name, entity and other `wandb.init` arguments in `wandb_config`.
			
 
				 
			
 
				 ```bash
			
 
				-python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization --model_name /patht_of_model_folder/7B --output_dir Path/to/save/PEFT/model --use_wandb
			
 
				+python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization --model_name /patht_of_model_folder/8B --output_dir Path/to/save/PEFT/model --use_wandb
			
 
				 ```
			
 
				-You'll be able to access a dedicated project or run link on [wandb.ai](https://wandb.ai) and see your dashboard like the one below. 
			
 
				+You'll be able to access a dedicated project or run link on [wandb.ai](https://wandb.ai) and see your dashboard like the one below.
			
 
				 <div style="display: flex;">
			
 
				     <img src="../../docs/images/wandb_screenshot.png" alt="wandb screenshot" width="500" />
			
 
				 </div>