Browse Source

fix small errors

Kai Wu 2 months ago
parent
commit
46cb6baee9
1 changed files with 14 additions and 8 deletions
  1. 14 8
      getting-started/finetuning/finetune_llama4.md

+ 14 - 8
getting-started/finetuning/finetune_llama4.md

@@ -1,6 +1,6 @@
 ## Fine-Tuning Tutorial for Llama4 Models with torchtune
 ## Fine-Tuning Tutorial for Llama4 Models with torchtune
 
 
-This tutorial shows how to perform Low-Rank Adaptation (LoRA) fine-tuning on Llama4 models using torchtune, based on the recent PR adding LoRA support for Llama4.
+This tutorial shows how to perform fine-tuning on Llama4 models using [torchtune](https://github.com/pytorch/torchtune?tab=readme-ov-file).
 
 
 ### Prerequisites
 ### Prerequisites
 
 
@@ -9,11 +9,11 @@ This tutorial shows how to perform Low-Rank Adaptation (LoRA) fine-tuning on Lla
 pip install --pre torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu --no-cache-dir
 pip install --pre torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu --no-cache-dir
 ```
 ```
 
 
-2. We also need Hugging Face access token (HF_TOKEN) for model download, please follow the instructions [here](https://huggingface.co/docs/hub/security-tokens) to get your own token.
+2. We also need Hugging Face access token (HF_TOKEN) for model download, please follow the instructions [here](https://huggingface.co/docs/hub/security-tokens) to get your own token. You will also need to gain model access to Llama4 models from [here](https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164)
 
 
-### Tutorial
+### Steps
 1. Download Llama4 Weights
 1. Download Llama4 Weights
-Replace <HF_TOKEN> with your Hugging Face token:
+We will use `meta-llama/Llama-4-Scout-17B-16E-Instruct` as an example here. Replace <HF_TOKEN> with your Hugging Face token:
 
 
 ```bash
 ```bash
 tune download meta-llama/Llama-4-Scout-17B-16E-Instruct --output-dir /tmp/Llama-4-Scout-17B-16E-Instruct --hf-token $HF_TOKEN
 tune download meta-llama/Llama-4-Scout-17B-16E-Instruct --output-dir /tmp/Llama-4-Scout-17B-16E-Instruct --hf-token $HF_TOKEN
@@ -27,23 +27,29 @@ tune download meta-llama/Llama-4-Scout-17B-16E-Instruct --output-dir /tmp/Llama-
 This retrieves the model weights, tokenizer from Hugging Face.
 This retrieves the model weights, tokenizer from Hugging Face.
 
 
 2. Run LoRA Fine-Tuning for Llama4
 2. Run LoRA Fine-Tuning for Llama4
+
 To run LoRA fine-tuning, use the following command:
 To run LoRA fine-tuning, use the following command:
 ```bash
 ```bash
 tune run --nproc_per_node 8 lora_finetune_distributed --config llama4/scout_17B_16E_lora
 tune run --nproc_per_node 8 lora_finetune_distributed --config llama4/scout_17B_16E_lora
 ```
 ```
+This will run LoRA fine-tuning on Llama4 model with 8 GPUs. It will requires around 400GB gpu memory to do Llama4 Scout LoRA fine-tuning.
+
 You can add specific overrides through the command line. For example, to use a larger batch_size:
 You can add specific overrides through the command line. For example, to use a larger batch_size:
+
 ```bash
 ```bash
   tune run --nproc_per_node 8 lora_finetune_distributed --config llama4/scout_17B_16E_lora batch_size=4 dataset.packed=True tokenizer.max_seq_len=2048
   tune run --nproc_per_node 8 lora_finetune_distributed --config llama4/scout_17B_16E_lora batch_size=4 dataset.packed=True tokenizer.max_seq_len=2048
 ```
 ```
-This will run LoRA fine-tuning on Llama4 model with 8 GPUs. It will requires around 400GB gpu memory to do Scout lora fine-tuning.
 
 
-The config llama4/scout_17B_16E_lora is a config file that specifies the model, tokenizer, and training parameters. The dataset.packed=True and tokenizer.max_seq_len=2048 are additional arguments that specify the dataset and tokenizer settings.To learn more about the available options, please refer to the [YAML config documentation](https://pytorch.org/torchtune/stable/deep_dives/configs.html#config-tutorial-label)
+The config llama4/scout_17B_16E_lora is a config file that specifies the model, tokenizer, and training parameters. The dataset.packed=True and tokenizer.max_seq_len=2048 are additional arguments that specify the dataset and tokenizer settings. To learn more about the available options, please refer to the [YAML config documentation](https://pytorch.org/torchtune/stable/deep_dives/configs.html#config-tutorial-label)
 
 
 With this setup, you can efficiently train LoRA adapters on Llama4 models using torchtune’s native recipes.
 With this setup, you can efficiently train LoRA adapters on Llama4 models using torchtune’s native recipes.
 
 
-3. Full Parameter Fine-Tuning for Llama4 (Optional)
+3. Full Parameter Fine-Tuning for Llama4
+
 To run full parameter fine-tuning, use the following command:
 To run full parameter fine-tuning, use the following command:
+
 ```bash
 ```bash
   tune run --nproc_per_node 4  --nproc_per_node 8 full_finetune_distributed --config llama4/scout_17B_16E_full batch_size=4 dataset.packed=True tokenizer.max_seq_len=2048
   tune run --nproc_per_node 4  --nproc_per_node 8 full_finetune_distributed --config llama4/scout_17B_16E_full batch_size=4 dataset.packed=True tokenizer.max_seq_len=2048
   ```
   ```
-This will run full parameter fine-tuning on Llama4 model with 8 GPUs. It will requires around 2200GB gpu memory to do Scout full parameter fine-tuning, which is about 4 8xH100 nodes.
+
+This will run full parameter fine-tuning on Llama4 model with 4 nodes. It will requires around 2200GB gpu memory to do Scout full parameter fine-tuning, which is about 4 8xH100 nodes.