1 سال پیش · 9950b75d42
--- a/recipes/quickstart/Multi-Modal-RAG/README.md
+++ b/recipes/quickstart/Multi-Modal-RAG/README.md
@@ -13,10 +13,8 @@ This is a complete workshop on labelling images using the new Llama 3.2-Vision M
 
				 Before we start:
			
 
				 
			
 
				 1. Please grab your HF CLI Token from [here](https://huggingface.co/settings/tokens)
			
 
				-2. git clone [this dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) inside the Multi-Modal-RAG folder: `git clone https://huggingface.co/datasets/Sanyam/MM-Demo`
			
 
				-3. Launch jupyter notebook inside this folder
			
 
				-4. We will also run two scripts after the notebooks
			
 
				-5. Make sure you grab a together.ai token [here](https://www.together.ai)
			
 
				+2. Git clone [this dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) inside the Multi-Modal-RAG folder: `git clone https://huggingface.co/datasets/Sanyam/MM-Demo`
			
 
				+3. Make sure you grab a together.ai token [here](https://www.together.ai)
			
 
				 
			
 
				 ## Detailed Outline for running:
			
 
				 
			
@@ -32,6 +30,8 @@ Here's the detailed outline:
 
				 
			
 
				 ### Step 1: Data Prep and Synthetic Labeling:
			
 
				 
			
 
				+In this step we start with an unlabelled dataset and use the image captioning capability of the model to write a description of the image and categorise it.
			
 
				+
			
 
				 [Notebook for Step 1](./notebooks/Part_1_Data_Preperation.ipynb) and [Script for Step 1](./scripts/label_script.py)
			
 
				 
			
 
				 To run the script (remember to set n):
			
@@ -46,9 +46,9 @@ The dataset consists of 5000 images with some meta-data.
 
				 
			
 
				 The first half is preparing the dataset for labeling:
			
 
				 - Clean/Remove corrupt images
			
 
				-- EDA to understand existing distribution
			
 
				+- Some exploratory analysis to understand existing distribution
			
 
				 - Merging up categories of clothes to reduce complexity 
			
 
				-- Balancing dataset by randomly sampling images
			
 
				+- Balancing dataset by randomly sampling images to have an equal distribution for retrieval
			
 
				 
			
 
				 Second Half consists of Labeling the dataset. Llama 3.2, 11B model can only process one image at a time:
			
 
				 - We load a few images and test captioning
			
@@ -61,9 +61,9 @@ After running the script on the entire dataset, we have more data cleaning to pe
 
				 
			
 
				 [Notebook for Step 2](./notebooks/Part_2_Cleaning_Data_and_DB.ipynb)
			
 
				 
			
 
				-Even after our lengthy (apart from other things) prompt, the model still hallucinates categories and label, here is how we address this
			
 
				+We notice that even after some fun prompt engineering, the model faces some hallucinations-there are some issues with the JSON formatting and we notice that it hallucinates the label categories. Here is how we address this:
			
 
				 
			
 
				-- Re-balance the dataset by mapping correct categories
			
 
				+- Re-balance the dataset by mapping correct categories. This is useful to make sure we have an equal distribution in our dataset for retrieval
			
 
				 - Fix Descriptions so that we can create a CSV
			
 
				 
			
 
				 Now, we are ready to try our vector db pipeline:
			
@@ -73,13 +73,13 @@ Now, we are ready to try our vector db pipeline:
 
				 [Notebook for Step 3](./notebooks/Part_3_RAG_Setup_and_Validation.ipynb) and [Final Demo Script](./scripts/label_script.py)
			
 
				 
			
 
				 
			
 
				-With the cleaned descriptions and dataset, we can now store these in a vector-db
			
 
				+With the cleaned descriptions and dataset, we can now store these in a vector-db, here's the steps:
			
 
				 
			
 
				-You will note that we are not using the categorization from our model-this is by design to show how RAG can simplify a lot of things. 
			
 
				 
			
 
				 - We create embeddings using the text description of our clothes
			
 
				 - Use 11-B model to describe the uploaded image
			
 
				-- Try to find similar or complimentary images based on the upload
			
 
				+- Ask the model to suggest complementary items to the upload
			
 
				+- Try to find similar or complementary images based on the upload
			
 
				 
			
 
				 We try the approach with different retrieval methods.
			
 
				 
			
@@ -96,7 +96,7 @@ python scripts/final_demo.py \
 
				     --use_existing_table 
			
 
				 ```
			
 
				 
			
 
				-Task: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
			
 
				+Note: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
			
 
				 
			
 
				 - Upload an image
			
 
				 - 11B model describes the image