|
@@ -105,7 +105,7 @@ python inference.py --model_name <training_config.output_dir> --peft_model <trai
|
|
|
|
|
|
## Inference with FSDP checkpoints
|
|
|
|
|
|
-In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
|
|
|
+In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../src/llama_cookbook/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
|
|
|
**To convert the checkpoint use the following command**:
|
|
|
|
|
|
This is helpful if you have fine-tuned you model using FSDP only as follows:
|
|
@@ -115,7 +115,7 @@ torchrun --nnodes 1 --nproc_per_node 8 recipes/quickstart/finetuning/finetuning
|
|
|
```
|
|
|
Then convert your FSDP checkpoint to HuggingFace checkpoints using:
|
|
|
```bash
|
|
|
- python -m llama_recipes.inference.checkpoint_converter_fsdp_hf --fsdp_checkpoint_path PATH/to/FSDP/Checkpoints --consolidated_model_path PATH/to/save/checkpoints --HF_model_path_or_name PATH/or/HF/model_name
|
|
|
+ python -m llama_cookbook.inference.checkpoint_converter_fsdp_hf --fsdp_checkpoint_path PATH/to/FSDP/Checkpoints --consolidated_model_path PATH/to/save/checkpoints --HF_model_path_or_name PATH/or/HF/model_name
|
|
|
|
|
|
# --HF_model_path_or_name specifies the HF Llama model name or path where it has config.json and tokenizer.json
|
|
|
```
|
|
@@ -130,4 +130,4 @@ python inference.py --model_name <training_config.output_dir> --prompt_file <tes
|
|
|
|
|
|
## Inference on large models like Meta Llama 405B
|
|
|
The FP8 quantized variants of Meta Llama (i.e. meta-llama/Meta-Llama-3.1-405B-FP8 and meta-llama/Meta-Llama-3.1-405B-Instruct-FP8) can be executed on a single node with 8x80GB H100 using the scripts located in this folder.
|
|
|
-To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p-integrations/vllm/README.md).
|
|
|
+To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-cookbook inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p-integrations/vllm/README.md).
|