Sanyam Bhutani před 11 měsíci
rodič
revize
bdb05c1967
1 změnil soubory, kde provedl 36 přidání a 4 odebrání
  1. 36 4
      recipes/quickstart/Multi-Modal-RAG/README.md

+ 36 - 4
recipes/quickstart/Multi-Modal-RAG/README.md

@@ -14,12 +14,15 @@ List of models and libraries used in the showcase:
 - [Lance-db for vector database](https://lancedb.com)
 - [This]() Kaggle dataset for building our work
 - [HF Dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) Since output of the model can be non-deterministic everytime we run, we will use the uploaded dataset to give a universal experience
+- [Transformers for 11B model](https://github.com/huggingface/transformers) 
+- [Gradio for Demo](https://github.com/gradio-app/gradio)
+- [Together API for demo](https://www.together.ai)
 
 ## Detailed Outline 
 
 Here's the detailed outline:
 
-Step 1: Data Prep and Synthetic Labeling:
+### Step 1: Data Prep and Synthetic Labeling:
 
 The dataset consists of 5000 images with some classification.
 
@@ -36,6 +39,35 @@ Second Half consists of Labelling the dataset. We are bound by an interesting co
 
 After running the script on the entire dataset, we have more data cleaning to perform:
 
-- Step 2: Cleaning up Synthetic Labels and preparing the dataset
-- Step 3: Notebook 3: MM-RAG using lance-db to validate idea
-- Step 4: Gradio App using Together API for Llama-3.2-11B and Lance-db for RAG
+### Step 2: Cleaning up Synthetic Labels and preparing the dataset:
+
+Even after our lengthy (apart from other things) prompt, the model still hallucinates categories and label-we need to address this
+
+- Re-balance the dataset by mapping correct categories
+- Fix Descriptions so that we can create a CSV
+
+Now, we are ready to try our vector db pipeline:
+
+### Step 3: Notebook 3: MM-RAG using lance-db to validate idea
+
+With the cleaned descriptions and dataset, we can now store these in a vector-db
+
+You will note that we are not using the categorisation from our model-this is by design to show how RAG can simplify a lot of things. 
+
+- We create embeddings using the text description of our clothes
+- Use 11-B model to describe the uploaded image
+- Try to find similar or complimentary images based on the upload
+
+We try the approach with different retrieval methods.
+
+
+### Step 4: Gradio App using Together API for Llama-3.2-11B and Lance-db for RAG
+
+Finally, we can bring this all together in a Gradio App. 
+
+Task: We can futher improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrival of "similar" clothes instead of "complementary" items
+
+- Upload an image
+- 11B model describes the image
+- We retrieve complementary clothes to wear based on the description
+- You can keep the loop going by chatting with the model