|
@@ -1,11 +1,14 @@
|
|
|
# Local Inference
|
|
|
|
|
|
+## Hugging face setup
|
|
|
+**Important Note**: Before running the inference, you'll need your Hugging Face access token, which you can get at your Settings page [here](https://huggingface.co/settings/tokens). Then run `huggingface-cli login` and copy and paste your Hugging Face access token to complete the login to make sure the scripts can download Hugging Face models if needed.
|
|
|
+
|
|
|
## Multimodal Inference
|
|
|
-For Multi-Modal inference we have added [multi_modal_infer.py](multi_modal_infer.py) which uses the transformers library
|
|
|
+For Multi-Modal inference we have added [multi_modal_infer.py](multi_modal_infer.py) which uses the transformers library.
|
|
|
|
|
|
-The way to run this would be
|
|
|
+The way to run this would be:
|
|
|
```
|
|
|
-python multi_modal_infer.py --image_path "./resources/image.jpg" --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
|
|
|
+python multi_modal_infer.py --image_path PATH_TO_IMAGE --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
|
|
|
```
|
|
|
|
|
|
## Text-only Inference
|