|
11 hónapja | |
---|---|---|
.. | ||
notebooks | 1 éve | |
scripts | 1 éve | |
README.md | 11 hónapja |
Llama-3.2-11B
model:This is a complete workshop on labelling images using the new Llama 3.2-Vision Models and performing RAG using the image caption capiblites of the model.
Llama-3.2-11B-Vision-Instruct
modelBefore we start:
git clone https://huggingface.co/datasets/Sanyam/MM-Demo
Order of running files, the notebook establish the method of approaching the problem. Once we establish it, we use the scripts to run the method end to end.
Part_1_Data_Preperation.ipynb
label_script.py
Part_2_Cleaning_Data_and_DB.ipynb
Part_3_RAG_Setup_and_Validation.ipynb
final_demo.py
Here's the detailed outline:
Notebook for Step 1 and Script for Step 1
To run the script:
python scripts/caption_generator.py --hf_token "your_huggingface_token_here" \
--input_path "../images" \
--output_path "/path/to/output/folder" \
--num_gpus 2
The dataset consists of 5000 images with some meta-data.
The first half is preparing the dataset for labeling:
Second Half consists of Labeling the dataset. We are bound by an interesting constraint here, 11B model can only caption one image at a time:
After running the script on the entire dataset, we have more data cleaning to perform.
Even after our lengthy (apart from other things) prompt, the model still hallucinates categories and label-we need to address this
Now, we are ready to try our vector db pipeline:
Notebook for Step 3 and Final Demo Script
With the cleaned descriptions and dataset, we can now store these in a vector-db
You will note that we are not using the categorization from our model-this is by design to show how RAG can simplify a lot of things.
We try the approach with different retrieval methods.
Finally, we can bring this all together in a Gradio App.
Task: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
Credit and Thanks to List of models and resources used in the showcase:
Firstly, thanks to the author here for providing this dataset on which we base our exercise []()