|  | hace 9 meses | |
|---|---|---|
| .. | ||
| notebooks | hace 9 meses | |
| scripts | hace 9 meses | |
| README.md | hace 9 meses | |
Llama-3.2-11B model:This is a complete workshop on how to label images using the new Llama 3.2-Vision Models and performing RAG using the image caption capabilities of the model.
Llama-3.2-11B-Vision-Instruct modelBefore we start:
git clone https://huggingface.co/datasets/Sanyam/MM-Demo (Remember to thank the original author by up voting Kaggle Dataset)Order of running files, the notebook establish the method of approaching the problem. Once we establish it, we use the scripts to run the method end to end.
Part_1_Data_Preparation.ipynblabel_script.pyPart_2_Cleaning_Data_and_DB.ipynbPart_3_RAG_Setup_and_Validation.ipynbfinal_demo.pyHere's the detailed outline:
In this step we start with an unlabeled dataset and use the image captioning capability of the model to write a description of the image and categorize it.
Notebook for Step 1 and Script for Step 1
To run the script (remember to set n):
python scripts/label_script.py --hf_token "your_huggingface_token_here" \
    --input_path "../MM-Demo/images_compressed" \
    --output_path "../MM-Demo/output/" \
    --num_gpus N
The dataset consists of 5000 images with some meta-data.
The first half is preparing the dataset for labeling:
Second Half consists of Labeling the dataset. Llama 3.2, 11B model can only process one image at a time:
After running the script on the entire dataset, we have more data cleaning to perform.
We notice that even after some fun prompt engineering, the model faces some hallucinations-there are some issues with the JSON formatting and we notice that it hallucinates the label categories. Here is how we address this:
Now, we are ready to try our vector db pipeline:
Notebook for Step 3 and Final Demo Script
With the cleaned descriptions and dataset, we can now store these in a vector-db, here's the steps:
We try the approach with different retrieval methods.
Finally, we can bring this all together in a Gradio App.
For running the script:
python scripts/final_demo.py \
    --images_folder "../MM-Demo/compressed_images" \
    --csv_path "../MM-Demo/final_balanced_sample_dataset.csv" \
    --table_path "~/.lancedb" \
    --api_key "your_together_api_key" \
    --default_model "BAAI/bge-large-en-v1.5" \
    --use_existing_table 
Note: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
Credit and Thanks to List of models and resources used in the showcase:
Firstly, thanks to the author here for providing this dataset on which we base our exercise here