torchtune.md 4.2 KB

torchtune recipe for Meta Llama 3 finetuning

torchtune is a PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs.

torchtune provides:

  • Native-PyTorch implementations of popular LLMs using composable and modular building blocks
  • Easy-to-use and hackable training recipes for popular fine-tuning techniques (LoRA, QLoRA) - no trainers, no frameworks, just PyTorch!
  • YAML configs for easily configuring training, evaluation, quantization or inference recipes
  • Built-in support for many popular dataset formats and prompt templates to help you quickly get started with training

Here we will show the how to use torchtune to easily finetune meta Llama 3. For more usage details, please check the torchtune repo.

Installation

Step 1: Install PyTorch. torchtune is tested with the latest stable PyTorch release as well as the preview nightly version.

Step 2: The latest stable version of torchtune is hosted on PyPI and can be downloaded with the following command:

pip install torchtune

To confirm that the package is installed correctly, you can run the following command:

tune --help

And should see the following output:

usage: tune [-h] {ls,cp,download,run,validate} ...

Welcome to the torchtune CLI!

options:
  -h, --help            show this help message and exit

...

You can also install the latest and greatest torchtune has to offer by installing a nightly build.


Meta Llama3 fine-tuning steps

torchtune supports fine-tuning for the Llama3 8B and 70B size models. It currently support LoRA, QLoRA and full fine-tune on a single GPU as well as LoRA and full fine-tune on multiple devices for the 8B model, and LoRA on multiple devices for the 70B model. For all the details, take a look at the tutorial.

Note: The Llama3 LoRA and QLoRA configs default to the instruct fine-tuned models. This is because not all special token embeddings are initialized in the base 8B and 70B models.

In the initial experiments for Llama3-8B, QLoRA has a peak allocated memory of ~9GB while LoRA on a single GPU has a peak allocated memory of ~19GB. To get started, you can use the default configs to kick off training.

torchtune repo download

We need to clone the torchtune repo to get the configs.

git clone git@github.com:pytorch/torchtune.git
cd torchtune/recipes/configs/

Model download

Select which model you want to download, in this example we want to use meta-llama/Meta-Llama-3-8B.

tune download meta-llama/Meta-Llama-3-8B \
--output-dir /tmp/Meta-Llama-3-8B \
--hf-token <HF_TOKEN> \

Tip: Set your environment variable HF_TOKEN or pass in --hf-token to the command in order to validate your access. You can find your token at https://huggingface.co/settings/tokens

Single GPU finetune

LoRA 8B

tune run lora_finetune_single_device --config llama3/8B_lora_single_device

QLoRA 8B

tune run lora_finetune_single_device --config llama3/8B_qlora_single_device

Full 8B

tune run full_finetune_single_device --config llama3/8B_full_single_device

Multi GPU finetune

Full 8B

tune run --nproc_per_node 4 full_finetune_distributed --config llama3/8B_full

LoRA 8B

tune run --nproc_per_node 2 lora_finetune_distributed --config llama3/8B_lora

LoRA 70B

Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because it use the HuggingFace safetensor model format to load the model. To download the 70B model, run

tune download meta-llama/Meta-Llama-3-70b --hf-token <> --output-dir /tmp/Meta-Llama-3-70b --ignore-patterns "original/consolidated*"

Then, a finetune can be kicked off:

tune run --nproc_per_node 8 lora_finetune_distributed --config recipes/configs/llama3/70B_lora.yaml

You can find a full list of all the Llama3 configs here.