1 年之前 · 0ef6b8d035
--- a/recipes/quickstart/finetuning/finetune_vision_model.md
+++ b/recipes/quickstart/finetuning/finetune_vision_model.md
@@ -22,6 +22,8 @@ For **LoRA finetuning with FSDP**, we can run the following code:
 
				 
			
 
				 For more details about the finetuning configurations, please read the [finetuning readme](./README.md).
			
 
				 
			
 
				+For more details about local inference with the fine-tuned checkpoint, please read [Inference with FSDP checkpoints section](https://github.com/meta-llama/llama-recipes/tree/main/recipes/quickstart/inference/local_inference#inference-with-fsdp-checkpoints) to learn how to convert the FSDP weights into a consolidated Hugging Face formated model for local inference.
			
 
				+
			
 
				 ### How to use a custom dataset to fine-tune vision model
			
 
				 
			
 
				 In order to use a custom dataset, please follow the steps below:
			
--- a/recipes/quickstart/inference/local_inference/README.md
+++ b/recipes/quickstart/inference/local_inference/README.md
@@ -1,11 +1,14 @@
 
				 # Local Inference
			
 
				 
			
 
				+## Hugging face setup
			
 
				+**Important Note**: Before running the inference, you'll need your Hugging Face access token, which you can get at your Settings page [here](https://huggingface.co/settings/tokens). Then run `huggingface-cli login` and copy and paste your Hugging Face access token to complete the login to make sure the scripts can download Hugging Face models if needed.
			
 
				+
			
 
				 ## Multimodal Inference
			
 
				-For Multi-Modal inference we have added [multi_modal_infer.py](multi_modal_infer.py) which uses the transformers library
			
 
				+For Multi-Modal inference we have added [multi_modal_infer.py](multi_modal_infer.py) which uses the transformers library.
			
 
				 
			
 
				-The way to run this would be
			
 
				+The way to run this would be:
			
 
				 ```
			
 
				-python multi_modal_infer.py --image_path "./resources/image.jpg" --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
			
 
				+python multi_modal_infer.py --image_path PATH_TO_IMAGE --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
			
 
				 ```
			
 
				 
			
 
				 ## Text-only Inference