1 year ago · 03ffc5bbc6
--- a/recipes/quickstart/Multi-Modal-RAG/README.md
+++ b/recipes/quickstart/Multi-Modal-RAG/README.md
@@ -1,30 +1,18 @@
 
				 # End to End Showcase using Llama models for Multi-Modal RAG 
			
 
				 
			
 
				-## Story Overview: Multi-Modal RAG using `Llama-3.2-11B` model: 
			
 
				+## Recipe Overview: Multi-Modal RAG using `Llama-3.2-11B` model: 
			
 
				 
			
 
				 - **Data Labeling and Preparation:** We start by downloading 5000 images of clothing items and labeling them using `Llama-3.2-11B-Vision-Instruct` model
			
 
				 - **Cleaning Labels:** With the labels based on the notebook above, we will then clean the dataset and prepare it for RAG
			
 
				 - **Building Vector DB and RAG Pipeline:** With the final clean dataset, we can use descriptions and 11B model to generate recommendations
			
 
				 
			
 
				-## Resources used: 
			
 
				-
			
 
				-Credit and Thanks to List of models and resources used in the showcase:
			
 
				-
			
 
				-Firstly, thanks to the author here for providing this dataset on which we base our exercise []()
			
 
				-
			
 
				-- [Llama-3.2-11B-Vision-Instruct Model](https://www.llama.com/docs/how-to-guides/vision-capabilities/)
			
 
				-- [Lance-db for vector database](https://lancedb.com)
			
 
				-- [This Kaggle dataset]()
			
 
				-- [HF Dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) Since output of the model can be non-deterministic every time we run, we will use the uploaded dataset to give a universal experience
			
 
				-- [Together API for demo](https://www.together.ai)
			
 
				-
			
 
				 ## Detailed Outline 
			
 
				 
			
 
				 Here's the detailed outline:
			
 
				 
			
 
				 ### Step 1: Data Prep and Synthetic Labeling:
			
 
				 
			
 
				-The dataset consists of 5000 images with some classification.
			
 
				+The dataset consists of 5000 images with some meta-data.
			
 
				 
			
 
				 The first half is preparing the dataset for labeling:
			
 
				 - Clean/Remove corrupt images
			
@@ -70,4 +58,16 @@ Task: We can further improve the description prompt. You will notice sometimes t
 
				 - Upload an image
			
 
				 - 11B model describes the image
			
 
				 - We retrieve complementary clothes to wear based on the description
			
 
				-- You can keep the loop going by chatting with the model
			
 
				+- You can keep the loop going by chatting with the model
			
 
				+
			
 
				+## Resources used: 
			
 
				+
			
 
				+Credit and Thanks to List of models and resources used in the showcase:
			
 
				+
			
 
				+Firstly, thanks to the author here for providing this dataset on which we base our exercise []()
			
 
				+
			
 
				+- [Llama-3.2-11B-Vision-Instruct Model](https://www.llama.com/docs/how-to-guides/vision-capabilities/)
			
 
				+- [Lance-db for vector database](https://lancedb.com)
			
 
				+- [This Kaggle dataset]()
			
 
				+- [HF Dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) Since output of the model can be non-deterministic every time we run, we will use the uploaded dataset to give a universal experience
			
 
				+- [Together API for demo](https://www.together.ai)