Sanyam Bhutani bdb05c1967 Fix Readme		1 سال پیش
..
notebooks	480fb94761 fix nbs	1 سال پیش
scripts	ca7cdce05e Fix Folders, Add readme	1 سال پیش
README.md	bdb05c1967 Fix Readme	1 سال پیش

End to End Showcase using Llama models for Multi-Modal RAG

Story Overview: Multi-Modal RAG using `Llama-3.2-11B` model:

Data Labelling and Preperation: We start by downloading 5000 images of clothing items and labelling them using 11B model
Clearning Labels: With the labels based on the notebook above, we will then clean the dataset and prepare it for RAG
Building Vector DB and RAG Pipeline: With the final clean dataset, we can use descriptions and 11B model to generate recommendations

Resources used:

List of models and libraries used in the showcase:

Llama-3.2-11B-Vision-Instruct Model
Lance-db for vector database
[This]() Kaggle dataset for building our work
HF Dataset Since output of the model can be non-deterministic everytime we run, we will use the uploaded dataset to give a universal experience
Transformers for 11B model
Gradio for Demo
Together API for demo

Detailed Outline

Here's the detailed outline:

Step 1: Data Prep and Synthetic Labeling:

The dataset consists of 5000 images with some classification.

The first half is preparing the dataset for labelling:

Clean/Remove corrupt images
EDA to understand existing distribution
Merging up categories of clothes to reduce complexity
Balancing dataset by randomly sampling images

Second Half consists of Labelling the dataset. We are bound by an interesting constraint here, 11B model can only caption one image at a time:

We load a few images and test captioning
We run this pipeline on random images and iterate on the prompt till we feel the model is giving good outputs
Finally, we can create a script to label all 5000 images on multi-GPU

After running the script on the entire dataset, we have more data cleaning to perform:

Step 2: Cleaning up Synthetic Labels and preparing the dataset:

Even after our lengthy (apart from other things) prompt, the model still hallucinates categories and label-we need to address this

Re-balance the dataset by mapping correct categories
Fix Descriptions so that we can create a CSV

Now, we are ready to try our vector db pipeline:

Step 3: Notebook 3: MM-RAG using lance-db to validate idea

With the cleaned descriptions and dataset, we can now store these in a vector-db

You will note that we are not using the categorisation from our model-this is by design to show how RAG can simplify a lot of things.

We create embeddings using the text description of our clothes
Use 11-B model to describe the uploaded image
Try to find similar or complimentary images based on the upload

We try the approach with different retrieval methods.

Step 4: Gradio App using Together API for Llama-3.2-11B and Lance-db for RAG

Finally, we can bring this all together in a Gradio App.

Task: We can futher improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrival of "similar" clothes instead of "complementary" items

Upload an image
11B model describes the image
We retrieve complementary clothes to wear based on the description
You can keep the loop going by chatting with the model

README.md