|
@@ -114,3 +114,32 @@ python inference.py --model_name <training_config.output_dir> --prompt_file <tes
|
|
## Inference on large models like Meta Llama 405B
|
|
## Inference on large models like Meta Llama 405B
|
|
The FP8 quantized variants of Meta Llama (i.e. meta-llama/Meta-Llama-3.1-405B-FP8 and meta-llama/Meta-Llama-3.1-405B-Instruct-FP8) can be executed on a single node with 8x80GB H100 using the scripts located in this folder.
|
|
The FP8 quantized variants of Meta Llama (i.e. meta-llama/Meta-Llama-3.1-405B-FP8 and meta-llama/Meta-Llama-3.1-405B-Instruct-FP8) can be executed on a single node with 8x80GB H100 using the scripts located in this folder.
|
|
To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).
|
|
To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).
|
|
|
|
+
|
|
|
|
+### Inference-with-lora-checkpoints
|
|
|
|
+
|
|
|
|
+After fine-tuning the model, you can use the `code-merge-inference.py` script to generate text from images. The script supports merging PEFT adapter weights from a specified path.
|
|
|
|
+
|
|
|
|
+#### Usage
|
|
|
|
+
|
|
|
|
+To run the inference script, use the following command:
|
|
|
|
+
|
|
|
|
+```bash
|
|
|
|
+python code-merge-inference.py \
|
|
|
|
+ --image_path "path/to/your/image.png" \
|
|
|
|
+ --prompt_text "Your prompt text here" \
|
|
|
|
+ --temperature 1 \
|
|
|
|
+ --top_p 0.5 \
|
|
|
|
+ --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
|
|
|
|
+ --hf_token "your_hugging_face_token" \
|
|
|
|
+ --finetuning_path "path/to/your/finetuned/model"
|
|
|
|
+```
|
|
|
|
+
|
|
|
|
+#### Script Details
|
|
|
|
+
|
|
|
|
+The `code-merge-inference.py` script performs the following steps:
|
|
|
|
+
|
|
|
|
+1. **Load Model and Processor**: Loads the pre-trained model and processor, and optionally loads PEFT adapter weights if specified.
|
|
|
|
+2. **Process Image**: Opens and converts the input image.
|
|
|
|
+3. **Generate Text**: Generates text from the image using the model and processor.
|
|
|
|
+
|
|
|
|
+For more details, refer to the `code-merge-inference.py` script.
|