10 bulan lalu · be4817c94b
--- a/end-to-end-use-cases/coding/text2sql/fine-tuning/README.md
+++ b/end-to-end-use-cases/coding/text2sql/fine-tuning/README.md
@@ -166,3 +166,54 @@ llama31-8b-text2sql-peft-quantized-cot
 
				 
			
 
				 The train loss chart should look like this:
			
 
				 ![](train_loss_cot.png)
			
 
				+
			
 
				+
			
 
				+## Fine-tuning with Llama 3.3 70B
			
 
				+
			
 
				+If you have 8xH100 GPUs, you can use [torchtune](https://github.com/pytorch/torchtune) to fine-tune Llama 3.3 70B and then evaluate the fine-tuned model. Note that "active development on torchtune" has been stopped ([detail](https://github.com/pytorch/torchtune/issues/2883)), but "Torchtune will continue to receive critical bug fixes and security patches during 2025", so here we just show torchtune as a method to fine-tune the larger Llama 3.3 70B on multiple GPUs.
			
 
				+
			
 
				+```
			
 
				+pip install torch torchvision torchao
			
 
				+pip install torchtune
			
 
				+tune download meta-llama/Llama-3.3-70B-Instruct --ignore-patterns "original/consolidated*" --output-dir /tmp/Llama-3.3-70B-Instruct
			
 
				+git clone https://github.com/pytorch/torchtune
			
 
				+cd torchtune/tree/main/recipes/configs
			
 
				+```
			
 
				+
			
 
				+Modify `llama3_3/70B_lora.yaml` as follows:
			
 
				+
			
 
				+```
			
 
				+output_dir: /tmp/torchtune/llama3_3_70B/lora
			
 
				+
			
 
				+# Dataset and Sampler
			
 
				+dataset:
			
 
				+  _component_: torchtune.datasets.chat_dataset
			
 
				+  source: json
			
 
				+  conversation_column: messages
			
 
				+  conversation_style: openai
			
 
				+  data_files: train_text2sql_cot_dataset_array.json
			
 
				+  #split: train
			
 
				+seed: null
			
 
				+shuffle: True
			
 
				+
			
 
				+# Validation
			
 
				+run_val_every_n_steps: null  # Change to an integer to enable validation every N steps
			
 
				+dataset_val:
			
 
				+  _component_: torchtune.datasets.chat_dataset
			
 
				+  source: json
			
 
				+  conversation_column: messages
			
 
				+  conversation_style: openai
			
 
				+  data_files: test_text2sql_cot_dataset_array.json
			
 
				+  #split: validation
			
 
				+batch_size_val: ${batch_size}
			
 
				+```
			
 
				+
			
 
				+Then run:
			
 
				+
			
 
				+```
			
 
				+tune run --nproc_per_node 8 lora_finetune_distributed --config llama3_3/70B_lora
			
 
				+```
			
 
				+
			
 
				+After the fine-tuning is done, cd to `text2sql/fine-tuning` folder, set `peft_model_path` as `/tmp/torchtune/llama3_3_70B/lora` and `output_dir` as `llama3_3_70B/lora`, then run `vllm serve llama3_3_70B/lora --tensor-parallel-size 8 --max-num-batched-tokens 8192 --max-num-seqs 64`.
			
 
				+
			
 
				+Finally, in the `eval/llama_eval.sh`, set `model='llama3_3_70B/lora'`, and run `sh llama_eval.sh`. The accuracy of the fine-tuned Llama 3.3 70B should be around 57.24%, compared with the original 54.11% for off-the-shelf Llama 3.3 70B as shown in the [eval README](../eval#evaluation-results).