瀏覽代碼

remove 405B ft doc

Matthias Reso 10 月之前
父節點
當前提交
c167945448
共有 3 個文件被更改,包括 0 次插入23 次删除
  1. 0 3
      docs/LLM_finetuning.md
  2. 0 17
      docs/multi_gpu.md
  3. 0 3
      recipes/quickstart/finetuning/LLM_finetuning_overview.md

+ 0 - 3
docs/LLM_finetuning.md

@@ -18,9 +18,6 @@ These methods will address three aspects:
 
 HF [PEFT](https://github.com/huggingface/peft) library provides an easy way of using these methods which we make use of here. Please read more [here](https://huggingface.co/blog/peft).
 
-For large models like Meta Llama 405B LoRA fine-tuning still requires a lot of memory. To decrease the amount of memory needed for fine-tuning we can apply quantization like 8bit or 4bit (QLoRA) quantization.
-
-
 ## 2. **Full/ Partial Parameter Fine-Tuning**
 
 Full parameter fine-tuning has its own advantages, in this method there are multiple strategies that can help:

+ 0 - 17
docs/multi_gpu.md

@@ -83,23 +83,6 @@ sbatch recipes/quickstart/finetuning/multi_node.slurm
 # Change the num nodes and GPU per nodes in the script before running.
 
 ```
-### Fine-tuning using FSDP on 405B Model
-
-To fine-tune the Meta Llama 405B model with LoRA on 32xH100, 80 GB GPUs we need to combine 4bit quantization (QLoRA) and FSDP.
-We can achieve this by adding the following environment variables to the slurm script (before the srun command in the bottom).
-
-```bash
-export FSDP_CPU_RAM_EFFICIENT_LOADING=1
-export ACCELERATE_USE_FSDP=1 
-```
-
-Then we need to replace the bottom srun command with the following:
-
-```bash
-srun  torchrun --nproc_per_node 8 --rdzv_id $RANDOM --rdzv_backend c10d --rdzv_endpoint $head_node_ip:29500 ./finetuning.py  --enable_fsdp --use_peft --peft_method lora --quantization 4bit  --quantization_config.quant_type nf4 --mixed_precision False --low_cpu_fsdp
-```
-
-Do not forget to adjust the number of nodes, ntasks and gpus-per-task in the top.
 
 ## How to run with different datasets?
 

+ 0 - 3
recipes/quickstart/finetuning/LLM_finetuning_overview.md

@@ -18,9 +18,6 @@ These methods will address three aspects:
 
 HF [PEFT](https://github.com/huggingface/peft) library provides an easy way of using these methods which we make use of here. Please read more [here](https://huggingface.co/blog/peft).
 
-For large models like Meta Llama 405B LoRA fine-tuning still requires a lot of memory. To decrease the amount of memory needed for fine-tuning we can apply quantization like 8bit or 4bit (QLoRA) quantization.
-
-
 ## 2. **Full/ Partial Parameter Fine-Tuning**
 
 Full parameter fine-tuning has its own advantages, in this method there are multiple strategies that can help: