1 год назад · fa2b2b6732
--- a/recipes/quickstart/Multi-Modal-RAG/notebooks/Part_1_Data_Preperation.ipynb
+++ b/recipes/quickstart/Multi-Modal-RAG/notebooks/Part_1_Data_Preperation.ipynb
@@ -66,7 +66,13 @@
 
				    "id": "01fbc052-b633-4d7c-a6b8-e8b70c484697",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "#### All the imports"
			
 
				+    "#### All the imports\n",
			
 
				+    "\n",
			
 
				+    "We import all the libraries here. \n",
			
 
				+    "\n",
			
 
				+    "- PIL: For handling images to be passed to our Llama model\n",
			
 
				+    "- Huggingface Tranformers: For running the model\n",
			
 
				+    "- Concurrent Library: Because 405B suggested its useful for speedups and we want to look smart when doing OS stuff :) "
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -99,7 +105,11 @@
 
				    "id": "544c6687-e174-4490-b221-4b3fbed080b3",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "#### Clean Corrupt Images"
			
 
				+    "#### Clean Corrupt Images\n",
			
 
				+    "\n",
			
 
				+    "Cleaning corruption is a task for AGI but we can handle the corrupt images in our dataset for now with some concurrency for fast checking. \n",
			
 
				+    "\n",
			
 
				+    "This takes a few moments so it might be a good idea to take a small break and socialise for a good change. "
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -180,6 +190,14 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "d339c0d1",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Let's load in the Meta-Data of the images and remove the rows with the corrupt images"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 7,
			
 
				    "id": "05c65335-ad2f-4735-a25b-d75adb195113",
			
@@ -295,6 +313,14 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "cc899cf1",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We can now \"clean\" up the dataframe by subtracting the corrupt images."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 9,
			
 
				    "id": "1f1e37bb-b625-44ac-b1bb-c2361b5edbf9",
			
@@ -340,7 +366,11 @@
 
				     "jp-MarkdownHeadingCollapsed": true
			
 
				    },
			
 
				    "source": [
			
 
				-    "## EDA"
			
 
				+    "## EDA\n",
			
 
				+    "\n",
			
 
				+    "Now that we got rid of corruption we can proceed to building a great society with checking our dataset :) \n",
			
 
				+    "\n",
			
 
				+    "Let's start by double-checking any empty values"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -499,6 +529,16 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c65411e6",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### Understanding the Label Distribution \n",
			
 
				+    "\n",
			
 
				+    "The existing dataset comes with multi-labels, let's take a look at all categories:"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 15,
			
 
				    "id": "fea1f2d8-48c4-4b0e-9790-3427c2517e4e",
			
@@ -570,6 +610,14 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "1cc50c67",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "If we had more ~~prompts~~ time, this would be a fancier plot but for now let's take a look at the distribution skew to understand what's in our dataset:"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 17,
			
 
				    "id": "14a86ee1-d419-495b-86b0-7ef193e81b4a",
			
@@ -598,6 +646,17 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "a0861297",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Let's start with some more cleanup:\n",
			
 
				+    "\n",
			
 
				+    "- Remove kids clothing since that is a smaller subset\n",
			
 
				+    "- Let's use our lack of understanding of fashion to reduce categories and also make our lives with pre-processing easier"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 18,
			
 
				    "id": "48a00d85-011d-4632-af7d-d34c8dee6a2c",
			
@@ -752,6 +811,14 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c2793936",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "For once, lack of fashion knowledge is useful-we can reduce our work by creating less categories. Nicely organised just like an coder's wardrobe"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 20,
			
 
				    "id": "99115476-9862-4b92-83f4-dd0145e1ee86",
			
@@ -825,6 +892,14 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "a3e0061a",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "This is the part that makes Thanos happy, we will balance our universe of clothes by randomly sampling."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 22,
			
 
				    "id": "43b65158-1865-4535-bba0-610b32811c82",
			
@@ -934,7 +1009,15 @@
 
				    "id": "5798ee82-e237-4dd4-8a07-7777694a8981",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Synthetic Labelling using Llama 3.2"
			
 
				+    "## Synthetic Labelling using Llama 3.2\n",
			
 
				+    "\n",
			
 
				+    "All the effort so far was to prepare our dataset for labelling. \n",
			
 
				+    "\n",
			
 
				+    "At this stage, we are ready to start labelling the images using Llama-3.2 models. We will use 11B here for testing. \n",
			
 
				+    "\n",
			
 
				+    "For our rich readers, we suggest testing 90B as an assignment. Although you will find that 11B is a great candidate for this model. \n",
			
 
				+    "\n",
			
 
				+    "Read more about the model capabilites [here](https://www.llama.com/docs/how-to-guides/vision-capabilities/)"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -981,6 +1064,14 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "2d97ec1b",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Feel free to randomly grab any example from the `ls` command above. This shirt is colorful enough for us to use-so we will go with the current example"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 27,
			
 
				    "id": "8112f7bb-377c-4556-90a6-3e576321c152",
			
@@ -1028,6 +1119,23 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "f9d3d44f",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### Labelling Prompt\n",
			
 
				+    "\n",
			
 
				+    "For anyone who feels strongly about Prompt Engineering-this section is for you. The drama in the first prompt stems from constant errors encountered when running the model. \n",
			
 
				+    "\n",
			
 
				+    "Suggested approach:\n",
			
 
				+    "\n",
			
 
				+    "- Run a simple prompt on an image\n",
			
 
				+    "- See output and iterate\n",
			
 
				+    "\n",
			
 
				+    "After painfully trying this a few times, we learn that for some reason the model doesn't follow JSON formatting unless it's strongly urged. So we fix this with the dramatic prompt:"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 30,
			
 
				    "id": "1de59227-6042-441b-a1f8-b19ce83f7c45",