| 
					
				 | 
			
			
				@@ -9,15 +9,9 @@ To run fine-tuning on multi-GPUs, we will  make use of two packages: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Given the combination of PEFT and FSDP, we would be able to fine tune a Llama 2 model on multiple GPUs in one node or multi-node. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ## Requirements  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-To run the examples, make sure to install the requirements using  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`llama_finetuning.py`](../llama_finetuning.py) script with torchrun (See [README.md](../README.md) for details). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-```bash 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				- 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-pip install -r requirements.txt 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				- 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-``` 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				- 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-**Please note that the above requirements.txt will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.** 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+**Please note that the llama_recipes package will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.** 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ## How to run it 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -30,7 +24,7 @@ This runs with the `samsum_dataset` for summarization application by default. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ```bash 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-torchrun --nnodes 1 --nproc_per_node 4  ../llama_finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+torchrun --nnodes 1 --nproc_per_node 4  llama_finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ``` 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -49,7 +43,7 @@ We use `torchrun` here to spawn multiple processes for FSDP. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Setting `use_fast_kernels` will enable using of Flash Attention or Xformer memory-efficient kernels based on the hardware being used. This would speed up the fine-tuning job. This has been enabled in `optimum` library from HuggingFace as a one-liner API, please read more [here](https://pytorch.org/blog/out-of-the-box-acceleration/). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ```bash 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-torchrun --nnodes 1 --nproc_per_node 4  ../llama_finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model --use_fast_kernels 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+torchrun --nnodes 1 --nproc_per_node 4  llama_finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model --use_fast_kernels 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ``` 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ### Fine-tuning using FSDP Only 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -85,14 +79,14 @@ sbatch multi_node.slurm 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ## How to run with different datasets? 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-Currently 4 datasets are supported that can be found in [Datasets config file](../configs/datasets.py). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Currently 4 datasets are supported that can be found in [Datasets config file](../src/llama_recipes/configs/datasets.py). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-* `grammar_dataset` : use this [notebook](../ft_datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process theJfleg and C4 200M datasets for grammar checking. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+* `grammar_dataset` : use this [notebook](../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process theJfleg and C4 200M datasets for grammar checking. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-* `alpaca_dataset` : to get this open source data please download the `aplaca.json` to `ft_dataset` folder. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+* `alpaca_dataset` : to get this open source data please download the `aplaca.json` to `dataset` folder. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ```bash 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-wget -P ft_datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+wget -P datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ``` 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 * `samsum_dataset` 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -101,22 +95,22 @@ To run with each of the datasets set the `dataset` flag in the command as shown 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ```bash 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 # grammer_dataset 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-torchrun --nnodes 1 --nproc_per_node 4  ../llama_finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset grammar_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned  --pure_bf16 --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+torchrun --nnodes 1 --nproc_per_node 4  llama_finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset grammar_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned  --pure_bf16 --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 # alpaca_dataset 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-torchrun --nnodes 1 --nproc_per_node 4  ../llama_finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset alpaca_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+torchrun --nnodes 1 --nproc_per_node 4  llama_finetuning.py --enable_fsdp  --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset alpaca_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 # samsum_dataset 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-torchrun --nnodes 1 --nproc_per_node 4  ../llama_finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset samsum_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+torchrun --nnodes 1 --nproc_per_node 4  llama_finetuning.py --enable_fsdp --model_name /patht_of_model_folder/7B --use_peft --peft_method lora --dataset samsum_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ``` 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ## Where to configure settings? 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-* [Training config file](../configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../configs/) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+* [Training config file](../src/llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../src/llama_recipes/configs/) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 It lets us specify the training settings for everything from `model_name` to `dataset_name`, `batch_size` and so on. Below is the list of supported settings: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -149,11 +143,11 @@ save_optimizer: bool=False 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ``` 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-* [Datasets config file](../configs/datasets.py) provides the available options for datasets. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+* [Datasets config file](../src/llama_recipes/configs/datasets.py) provides the available options for datasets. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-* [peft config file](../configs/peft.py) provides the supported PEFT methods and respective settings that can be modified. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+* [peft config file](../src/llama_recipes/configs/peft.py) provides the supported PEFT methods and respective settings that can be modified. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-* [FSDP config file](../configs/fsdp.py) provides FSDP settings such as: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+* [FSDP config file](../src/llama_recipes/configs/fsdp.py) provides FSDP settings such as: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				     * `mixed_precision` boolean flag to specify using mixed precision, defatults to true. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 |