1 year ago · 0ef6b8d035
--- a/recipes/quickstart/finetuning/finetune_vision_model.md
+++ b/recipes/quickstart/finetuning/finetune_vision_model.md
@@ -22,6 +22,8 @@ For **LoRA finetuning with FSDP**, we can run the following code:
 
																 For more details about the finetuning configurations, please read the [finetuning readme](./README.md).
															
 
																+For more details about local inference with the fine-tuned checkpoint, please read [Inference with FSDP checkpoints section](https://github.com/meta-llama/llama-recipes/tree/main/recipes/quickstart/inference/local_inference#inference-with-fsdp-checkpoints) to learn how to convert the FSDP weights into a consolidated Hugging Face formated model for local inference.
															
 
																+
															
 
																 ### How to use a custom dataset to fine-tune vision model
															
 
																 In order to use a custom dataset, please follow the steps below:
															
--- a/recipes/quickstart/inference/local_inference/README.md
+++ b/recipes/quickstart/inference/local_inference/README.md
@@ -1,11 +1,14 @@
 
																 # Local Inference
															
 
																+## Hugging face setup
															
 
																+**Important Note**: Before running the inference, you'll need your Hugging Face access token, which you can get at your Settings page [here](https://huggingface.co/settings/tokens). Then run `huggingface-cli login` and copy and paste your Hugging Face access token to complete the login to make sure the scripts can download Hugging Face models if needed.
															
 
																+
															
 
																 ## Multimodal Inference
															
 
																-For Multi-Modal inference we have added [multi_modal_infer.py](multi_modal_infer.py) which uses the transformers library
															
 
																+For Multi-Modal inference we have added [multi_modal_infer.py](multi_modal_infer.py) which uses the transformers library.
															
 
																-The way to run this would be
															
 
																+The way to run this would be:
															
 
																 ```
															
 
																-python multi_modal_infer.py --image_path "./resources/image.jpg" --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
															
 
																+python multi_modal_infer.py --image_path PATH_TO_IMAGE --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
															
 
																 ```
															
 
																 ## Text-only Inference