torchtune is a PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs.
torchtune provides:
Here we will show the how to use torchtune to easily finetune meta Llama 3. For more usage details, please check the torchtune repo.
Step 1: Install PyTorch. torchtune is tested with the latest stable PyTorch release as well as the preview nightly version.
Step 2: The latest stable version of torchtune is hosted on PyPI and can be downloaded with the following command:
pip install torchtune
To confirm that the package is installed correctly, you can run the following command:
tune --help
And should see the following output:
usage: tune [-h] {ls,cp,download,run,validate} ...
Welcome to the torchtune CLI!
options:
-h, --help show this help message and exit
...
You can also install the latest and greatest torchtune has to offer by installing a nightly build.
torchtune supports fine-tuning for the Llama3 8B and 70B size models. It currently support LoRA, QLoRA and full fine-tune on a single GPU as well as LoRA and full fine-tune on multiple devices for the 8B model, and LoRA on multiple devices for the 70B model. For all the details, take a look at the tutorial.
Note: The Llama3 LoRA and QLoRA configs default to the instruct fine-tuned models. This is because not all special token embeddings are initialized in the base 8B and 70B models.
In the initial experiments for Llama3-8B, QLoRA has a peak allocated memory of ~9GB while LoRA on a single GPU has a peak allocated memory of ~19GB. To get started, you can use the default configs to kick off training.
We need to clone the torchtune repo to get the configs.
git clone git@github.com:pytorch/torchtune.git
cd torchtune/recipes/configs/
Select which model you want to download, in this example we want to use meta-llama/Meta-Llama-3-8B.
tune download meta-llama/Meta-Llama-3-8B \
--output-dir /tmp/Meta-Llama-3-8B \
--hf-token <HF_TOKEN> \
Tip: Set your environment variable
HF_TOKENor pass in--hf-tokento the command in order to validate your access. You can find your token at https://huggingface.co/settings/tokens
LoRA 8B
tune run lora_finetune_single_device --config llama3/8B_lora_single_device
QLoRA 8B
tune run lora_finetune_single_device --config llama3/8B_qlora_single_device
Full 8B
tune run full_finetune_single_device --config llama3/8B_full_single_device
Full 8B
tune run --nproc_per_node 4 full_finetune_distributed --config llama3/8B_full
LoRA 8B
tune run --nproc_per_node 2 lora_finetune_distributed --config llama3/8B_lora
LoRA 70B
Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because it use the HuggingFace safetensor model format to load the model. To download the 70B model, run
tune download meta-llama/Meta-Llama-3-70b --hf-token <> --output-dir /tmp/Meta-Llama-3-70b --ignore-patterns "original/consolidated*"
Then, a finetune can be kicked off:
tune run --nproc_per_node 8 lora_finetune_distributed --config recipes/configs/llama3/70B_lora.yaml
You can find a full list of all the Llama3 configs here.