1 год назад · e76e148586
--- a/recipes/quickstart/Multi-Modal-RAG/README.md
+++ b/recipes/quickstart/Multi-Modal-RAG/README.md
@@ -2,8 +2,8 @@
 
				 
			
 
				 ## Story Overview: Multi-Modal RAG using `Llama-3.2-11B` model: 
			
 
				 
			
 
				-- **Data Labelling and Preperation:** We start by downloading 5000 images of clothing items and labelling them using 11B model
			
 
				-- **Clearning Labels:** With the labels based on the notebook above, we will then clean the dataset and prepare it for RAG
			
 
				+- **Data Labeling and Preparation:** We start by downloading 5000 images of clothing items and labeling them using 11B model
			
 
				+- **Cleaning Labels:** With the labels based on the notebook above, we will then clean the dataset and prepare it for RAG
			
 
				 - **Building Vector DB and RAG Pipeline:** With the final clean dataset, we can use descriptions and 11B model to generate recommendations
			
 
				 
			
 
				 ## Resources used: 
			
@@ -26,13 +26,13 @@ Here's the detailed outline:
 
				 
			
 
				 The dataset consists of 5000 images with some classification.
			
 
				 
			
 
				-The first half is preparing the dataset for labelling:
			
 
				+The first half is preparing the dataset for labeling:
			
 
				 - Clean/Remove corrupt images
			
 
				 - EDA to understand existing distribution
			
 
				 - Merging up categories of clothes to reduce complexity 
			
 
				 - Balancing dataset by randomly sampling images
			
 
				 
			
 
				-Second Half consists of Labelling the dataset. We are bound by an interesting constraint here, 11B model can only caption one image at a time:
			
 
				+Second Half consists of Labeling the dataset. We are bound by an interesting constraint here, 11B model can only caption one image at a time:
			
 
				 - We load a few images and test captioning
			
 
				 - We run this pipeline on random images and iterate on the prompt till we feel the model is giving good outputs
			
 
				 - Finally, we can create a script to label all 5000 images on multi-GPU
			
@@ -52,7 +52,7 @@ Now, we are ready to try our vector db pipeline:
 
				 
			
 
				 With the cleaned descriptions and dataset, we can now store these in a vector-db
			
 
				 
			
 
				-You will note that we are not using the categorisation from our model-this is by design to show how RAG can simplify a lot of things. 
			
 
				+You will note that we are not using the categorization from our model-this is by design to show how RAG can simplify a lot of things. 
			
 
				 
			
 
				 - We create embeddings using the text description of our clothes
			
 
				 - Use 11-B model to describe the uploaded image
			
@@ -65,7 +65,7 @@ We try the approach with different retrieval methods.
 
				 
			
 
				 Finally, we can bring this all together in a Gradio App. 
			
 
				 
			
 
				-Task: We can futher improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrival of "similar" clothes instead of "complementary" items
			
 
				+Task: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
			
 
				 
			
 
				 - Upload an image
			
 
				 - 11B model describes the image