{ "cells": [ { "cell_type": "markdown", "id": "01af3b74-b3b9-4c1f-b41d-2911e7f19ffe", "metadata": {}, "source": [ "## Data Preparation Notebook\n", "\n", "To make the experience consistent, we will use [this link](https://huggingface.co/datasets/Sanyam/MM-Demo) for getting access to our dataset. To credit, thanks to the author [here](https://www.kaggle.com/datasets/agrigorev/clothing-dataset-full) for making it available. \n", "\n", "As thanks to original author-Please upvote the dataset version on Kaggle if you enjoy this course." ] }, { "cell_type": "markdown", "id": "1b4374ff-5104-44be-99ac-c52c94882409", "metadata": {}, "source": [ "## Data Cleanup" ] }, { "cell_type": "markdown", "id": "addccd91-3c83-45bd-83b0-d5147fdc40ff", "metadata": {}, "source": [ "### Removing Corrupt Images\n", "\n", "We will start by cleaning up the dataset first and checking for any corrupt images. " ] }, { "cell_type": "markdown", "id": "ee016cfd-5255-4809-96ec-406ee6947e24", "metadata": {}, "source": [ "#### Variables and Paths\n", "\n", "Let's first download the dataset and set our variables to point to it. \n", "\n", "Remember, this is something you will change, don't rush the shift+enter fingers yet! Please also set your hf-token in the line below" ] }, { "cell_type": "code", "execution_count": 1, "id": "af14ba04-bbce-47e4-8d84-3347582b50b9", "metadata": {}, "outputs": [], "source": [ "DATA = \"./DATA/\"\n", "META_DATA = f\"{DATA}images.csv/\"\n", "IMAGES = f\"{DATA}images_compressed/\"\n", "\n", "hf_token = \"\"\n", "model_name = \"meta-llama/Llama-3.2-11b-Vision-Instruct\"" ] }, { "cell_type": "markdown", "id": "01fbc052-b633-4d7c-a6b8-e8b70c484697", "metadata": {}, "source": [ "#### All the imports\n", "\n", "We import all the libraries here. \n", "\n", "- PIL: For handling images to be passed to our Llama model\n", "- Huggingface Transformers: For running the model\n", "- Concurrent Library: To clean up faster" ] }, { "cell_type": "code", "execution_count": 4, "id": "8a93cf35-f360-417a-9754-7c2fcb7121b8", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "\n", "from PIL import Image as PIL_Image\n", "from PIL import Image\n", "\n", "from tqdm import tqdm\n", "from concurrent.futures import ProcessPoolExecutor\n", "import multiprocessing\n", "\n", "\n", "import torch\n", "from transformers import MllamaForConditionalGeneration, MllamaProcessor\n" ] }, { "cell_type": "markdown", "id": "544c6687-e174-4490-b221-4b3fbed080b3", "metadata": {}, "source": [ "#### Clean Corrupt Images\n", "\n", "This might take a few moments since we have 5000 images in our dataset." ] }, { "cell_type": "code", "execution_count": 5, "id": "36c2a0b8-2953-4194-8cc0-e7338a8865fd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Corrupt images:\n", "./DATA/images_compressed/d028580f-9a98-4fb5-a6c9-5dc362ad3f09.jpg\n", "./DATA/images_compressed/784d67d4-b95e-4abb-baf7-8024f18dc3c8.jpg\n", "./DATA/images_compressed/b72ed5cd-9f5f-49a7-b12e-63a078212a17.jpg\n", "./DATA/images_compressed/1d0129a1-f29a-4a3f-b103-f651176183eb.jpg\n", "./DATA/images_compressed/c60e486d-10ed-4f64-abab-5bb698c736dd.jpg\n", "./DATA/images_compressed/040d73b7-21b5-4cf2-84fc-e1a80231b202.jpg\n", "Total corrupt images found: 6\n" ] } ], "source": [ "def is_image_corrupt(image_path):\n", " try:\n", " with Image.open(image_path) as img:\n", " img.verify()\n", " return False\n", " except (IOError, SyntaxError, Image.UnidentifiedImageError):\n", " return True\n", "\n", "def find_corrupt_images(folder_path):\n", " image_files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) \n", " if f.lower().endswith(('.png', '.jpg', '.jpeg'))]\n", " \n", " num_cores = multiprocessing.cpu_count()\n", " with ProcessPoolExecutor(max_workers=num_cores) as executor:\n", " results = executor.map(is_image_corrupt, image_files)\n", " \n", " corrupt_images = [img for img, is_corrupt in zip(image_files, results) if is_corrupt]\n", " return corrupt_images\n", "\n", "\n", "folder_path = IMAGES # Replace with your folder path\n", "corrupt_images = find_corrupt_images(folder_path)\n", "\n", "print(\"Corrupt images:\")\n", "for img in corrupt_images:\n", " print(img)\n", "print(f\"Total corrupt images found: {len(corrupt_images)}\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "b0ba27eb-2c5b-447e-9264-bb8db890bc12", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['./DATA/images_compressed/d028580f-9a98-4fb5-a6c9-5dc362ad3f09.jpg',\n", " './DATA/images_compressed/784d67d4-b95e-4abb-baf7-8024f18dc3c8.jpg',\n", " './DATA/images_compressed/b72ed5cd-9f5f-49a7-b12e-63a078212a17.jpg',\n", " './DATA/images_compressed/1d0129a1-f29a-4a3f-b103-f651176183eb.jpg',\n", " './DATA/images_compressed/c60e486d-10ed-4f64-abab-5bb698c736dd.jpg',\n", " './DATA/images_compressed/040d73b7-21b5-4cf2-84fc-e1a80231b202.jpg']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corrupt_images" ] }, { "cell_type": "markdown", "id": "d339c0d1", "metadata": {}, "source": [ "Let's load in the Meta-Data of the images and remove the rows with the corrupt images" ] }, { "cell_type": "code", "execution_count": 7, "id": "05c65335-ad2f-4735-a25b-d75adb195113", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
imagesender_idlabelkids
04285fab0-751a-4b74-8e9b-43af05deee22124Not sureFalse
1ea7b6656-3f84-4eb3-9099-23e623fc1018148T-ShirtFalse
200627a3f-0477-401c-95eb-92642cbe078d94Not sureFalse
3ea2ffd4d-9b25-4ca8-9dc2-bd27f1cc59fa43T-ShirtFalse
43b86d877-2b9e-4c8b-a6a2-1d87513309d0189ShoesFalse
\n", "
" ], "text/plain": [ " image sender_id label kids\n", "0 4285fab0-751a-4b74-8e9b-43af05deee22 124 Not sure False\n", "1 ea7b6656-3f84-4eb3-9099-23e623fc1018 148 T-Shirt False\n", "2 00627a3f-0477-401c-95eb-92642cbe078d 94 Not sure False\n", "3 ea2ffd4d-9b25-4ca8-9dc2-bd27f1cc59fa 43 T-Shirt False\n", "4 3b86d877-2b9e-4c8b-a6a2-1d87513309d0 189 Shoes False" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"./DATA/images.csv\")\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 8, "id": "761bda90-4d50-4796-b797-a4236b03adbf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Corrupt filenames:\n", "['d028580f-9a98-4fb5-a6c9-5dc362ad3f09', '784d67d4-b95e-4abb-baf7-8024f18dc3c8', 'b72ed5cd-9f5f-49a7-b12e-63a078212a17', '1d0129a1-f29a-4a3f-b103-f651176183eb', 'c60e486d-10ed-4f64-abab-5bb698c736dd', '040d73b7-21b5-4cf2-84fc-e1a80231b202']\n" ] } ], "source": [ "corrupt_filenames = [os.path.splitext(os.path.basename(path))[0] for path in corrupt_images]\n", "\n", "# Print out the corrupt filenames for verification\n", "print(\"Corrupt filenames:\")\n", "print(corrupt_filenames)" ] }, { "cell_type": "markdown", "id": "cc899cf1", "metadata": {}, "source": [ "We can now \"clean\" up the dataframe by subtracting the corrupt images." ] }, { "cell_type": "code", "execution_count": 9, "id": "1f1e37bb-b625-44ac-b1bb-c2361b5edbf9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of rows removed: 5\n", " image sender_id label kids\n", "0 4285fab0-751a-4b74-8e9b-43af05deee22 124 Not sure False\n", "1 ea7b6656-3f84-4eb3-9099-23e623fc1018 148 T-Shirt False\n", "2 00627a3f-0477-401c-95eb-92642cbe078d 94 Not sure False\n", "3 ea2ffd4d-9b25-4ca8-9dc2-bd27f1cc59fa 43 T-Shirt False\n", "4 3b86d877-2b9e-4c8b-a6a2-1d87513309d0 189 Shoes False\n" ] } ], "source": [ "df_clean = df[~df['image'].isin(corrupt_filenames)]\n", "# Print the number of rows removed\n", "print(f\"Number of rows removed: {len(df) - len(df_clean)}\")\n", "\n", "# Display the first few rows of the cleaned DataFrame\n", "print(df_clean.head())" ] }, { "cell_type": "code", "execution_count": 10, "id": "c63199a0-53ee-4249-a57c-c790405e65e5", "metadata": {}, "outputs": [], "source": [ "df_clean.to_csv('clean.csv', index=False)" ] }, { "cell_type": "markdown", "id": "db3d7f11-e5d2-49e3-a607-188f2f43379c", "metadata": {}, "source": [ "## EDA\n", "\n", "Let's start by double-checking any empty values" ] }, { "cell_type": "code", "execution_count": 11, "id": "225a1df9-6a17-4cef-a8f2-db55ca8647a3", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"./clean.csv\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "840693d6-6df3-407e-8090-d1396cbb01c4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 5398 entries, 0 to 5397\n", "Data columns (total 4 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 image 5398 non-null object\n", " 1 sender_id 5398 non-null int64 \n", " 2 label 5398 non-null object\n", " 3 kids 5398 non-null bool \n", "dtypes: bool(1), int64(1), object(2)\n", "memory usage: 131.9+ KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "code", "execution_count": 13, "id": "8199d67f-2b60-4a60-8fb0-58ef6b2117d1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
imagesender_idlabelkids
04285fab0-751a-4b74-8e9b-43af05deee22124Not sureFalse
1ea7b6656-3f84-4eb3-9099-23e623fc1018148T-ShirtFalse
200627a3f-0477-401c-95eb-92642cbe078d94Not sureFalse
3ea2ffd4d-9b25-4ca8-9dc2-bd27f1cc59fa43T-ShirtFalse
43b86d877-2b9e-4c8b-a6a2-1d87513309d0189ShoesFalse
\n", "
" ], "text/plain": [ " image sender_id label kids\n", "0 4285fab0-751a-4b74-8e9b-43af05deee22 124 Not sure False\n", "1 ea7b6656-3f84-4eb3-9099-23e623fc1018 148 T-Shirt False\n", "2 00627a3f-0477-401c-95eb-92642cbe078d 94 Not sure False\n", "3 ea2ffd4d-9b25-4ca8-9dc2-bd27f1cc59fa 43 T-Shirt False\n", "4 3b86d877-2b9e-4c8b-a6a2-1d87513309d0 189 Shoes False" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 14, "id": "3363ce8c-3226-4cb2-8e7f-e0b1ec66c13f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Missing values:\n", "image 0\n", "sender_id 0\n", "label 0\n", "kids 0\n", "dtype: int64\n" ] } ], "source": [ "# Step 4: Check for missing values\n", "print(\"\\nMissing values:\")\n", "print(df.isnull().sum())" ] }, { "cell_type": "markdown", "id": "c65411e6", "metadata": {}, "source": [ "#### Understanding the Label Distribution \n", "\n", "The existing dataset comes with multi-labels, let's take a look at all categories:" ] }, { "cell_type": "code", "execution_count": 15, "id": "fea1f2d8-48c4-4b0e-9790-3427c2517e4e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Unique labels:\n", "20\n", "\n", " Label Distribution:\n", "label\n", "T-Shirt 1011\n", "Longsleeve 699\n", "Pants 692\n", "Shoes 431\n", "Shirt 378\n", "Dress 357\n", "Outwear 312\n", "Shorts 308\n", "Not sure 228\n", "Hat 171\n", "Skirt 155\n", "Polo 120\n", "Undershirt 118\n", "Blazer 109\n", "Hoodie 100\n", "Body 69\n", "Other 67\n", "Top 43\n", "Blouse 23\n", "Skip 7\n", "Name: count, dtype: int64\n" ] } ], "source": [ "print(\"\\nUnique labels:\")\n", "print(df['label'].nunique())\n", "print(\"\\n Label Distribution:\")\n", "print(df['label'].value_counts())" ] }, { "cell_type": "code", "execution_count": 16, "id": "5b8bb4cc-b45c-4402-a0d4-316b9b250b8b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Distribution of kids vs. non-kids images:\n", "kids\n", "False 0.911819\n", "True 0.088181\n", "Name: proportion, dtype: float64\n" ] } ], "source": [ "print(\"\\nDistribution of kids vs. non-kids images:\")\n", "print(df['kids'].value_counts(normalize=True))" ] }, { "cell_type": "markdown", "id": "1cc50c67", "metadata": {}, "source": [ "Let's take a look at the distribution skew to understand what's in our dataset:" ] }, { "cell_type": "code", "execution_count": 17, "id": "14a86ee1-d419-495b-86b0-7ef193e81b4a", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(12, 6))\n", "df['label'].value_counts().head(20).plot(kind='bar')\n", "plt.title('Clothing Labels')\n", "plt.xlabel('Label')\n", "plt.ylabel('Count')\n", "plt.xticks(rotation=45, ha='right')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "a0861297", "metadata": {}, "source": [ "Let's start with some more cleanup:\n", "\n", "- Remove kids clothing since that is a smaller subset\n", "- Let's use our lack of understanding of fashion to reduce categories and also make our lives with pre-processing easier" ] }, { "cell_type": "code", "execution_count": 18, "id": "48a00d85-011d-4632-af7d-d34c8dee6a2c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original dataset shape: (5398, 4)\n", "Cleaned dataset shape: (4922, 3)\n" ] } ], "source": [ "df_no_kids = df[df['kids'] == False]\n", "df_cleaned = df_no_kids.drop('kids', axis=1)\n", "\n", "print(f\"Original dataset shape: {df.shape}\")\n", "print(f\"Cleaned dataset shape: {df_cleaned.shape}\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "4b8bde55-a9a2-48af-a70c-e794408f5676", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
imagesender_idlabel
04285fab0-751a-4b74-8e9b-43af05deee22124Not sure
1ea7b6656-3f84-4eb3-9099-23e623fc1018148T-Shirt
200627a3f-0477-401c-95eb-92642cbe078d94Not sure
3ea2ffd4d-9b25-4ca8-9dc2-bd27f1cc59fa43T-Shirt
43b86d877-2b9e-4c8b-a6a2-1d87513309d0189Shoes
............
53919bdac063-6c07-4bfc-a04a-e45224c503df204Undershirt
5393dfd4079d-967b-4b3e-8574-fbac11b58103204Shorts
53955379356a-40ee-4890-b416-2336a7d84061310Shorts
539665507fb8-3456-4c15-b53e-d1b03bf71a59204Shoes
539732b99302-cec7-4dec-adfa-3d4029674209204Skirt
\n", "

4922 rows × 3 columns

\n", "
" ], "text/plain": [ " image sender_id label\n", "0 4285fab0-751a-4b74-8e9b-43af05deee22 124 Not sure\n", "1 ea7b6656-3f84-4eb3-9099-23e623fc1018 148 T-Shirt\n", "2 00627a3f-0477-401c-95eb-92642cbe078d 94 Not sure\n", "3 ea2ffd4d-9b25-4ca8-9dc2-bd27f1cc59fa 43 T-Shirt\n", "4 3b86d877-2b9e-4c8b-a6a2-1d87513309d0 189 Shoes\n", "... ... ... ...\n", "5391 9bdac063-6c07-4bfc-a04a-e45224c503df 204 Undershirt\n", "5393 dfd4079d-967b-4b3e-8574-fbac11b58103 204 Shorts\n", "5395 5379356a-40ee-4890-b416-2336a7d84061 310 Shorts\n", "5396 65507fb8-3456-4c15-b53e-d1b03bf71a59 204 Shoes\n", "5397 32b99302-cec7-4dec-adfa-3d4029674209 204 Skirt\n", "\n", "[4922 rows x 3 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df_cleaned\n", "df" ] }, { "cell_type": "markdown", "id": "c2793936", "metadata": {}, "source": [ "For once, lack of fashion knowledge is useful-we can reduce our work by creating less categories." ] }, { "cell_type": "code", "execution_count": 20, "id": "99115476-9862-4b92-83f4-dd0145e1ee86", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Unique categories after merging:\n", "['Other' 'T-Shirt' 'Shoes' 'Shorts' 'Tops' 'Pants' 'Skirts']\n" ] } ], "source": [ "category_mapping = {\n", " 'T-Shirt': 'T-Shirt',\n", " 'Shoes': 'Shoes',\n", " 'Top': 'Tops',\n", " 'Blouse': 'Tops',\n", " 'Shirt': 'Tops',\n", " 'Polo': 'Tops',\n", " 'Longsleeve': 'Tops',\n", " 'Pants': 'Pants',\n", " 'Jeans': 'Jeans',\n", " 'Shorts': 'Shorts',\n", " 'Skirt': 'Skirts',\n", " 'Dress': 'Skirts',\n", " 'Footwear': 'Shoes',\n", " 'Outwear': 'Tops',\n", " 'Hat': 'Tops',\n", " 'Undershirt': 'T-Shirt',\n", " 'Body': 'Tops',\n", " 'Hoodie': 'Tops',\n", " 'Blazer': 'Tops'\n", "}\n", "\n", "df_cleaned['merged_category'] = df_cleaned['label'].map(category_mapping).fillna('Other')\n", "\n", "# Print the unique categories after merging\n", "print(\"Unique categories after merging:\")\n", "print(df_cleaned['merged_category'].unique())" ] }, { "cell_type": "code", "execution_count": 21, "id": "0e9844bf-45a4-460b-bfc2-7800b920ba55", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(12, 6))\n", "df_cleaned['merged_category'].value_counts().plot(kind='bar')\n", "plt.title('Distribution of Merged Clothing Categories')\n", "plt.xlabel('Category')\n", "plt.ylabel('Count')\n", "plt.xticks(rotation=45, ha='right')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "a3e0061a", "metadata": {}, "source": [ "This is the part that makes Thanos happy, we will balance our universe of clothes by randomly sampling." ] }, { "cell_type": "code", "execution_count": 22, "id": "43b65158-1865-4535-bba0-610b32811c82", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Category counts in the balanced dataset:\n", "merged_category\n", "Pants 500\n", "T-Shirt 500\n", "Tops 500\n", "Skirts 457\n", "Shoes 371\n", "Shorts 284\n", "Other 266\n", "Name: count, dtype: int64\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_2065289/1389168415.py:7: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n", " df_balanced = df_cleaned.groupby('merged_category').apply(balance_category).reset_index(drop=True)\n" ] } ], "source": [ "def balance_category(group):\n", " if len(group) > 500:\n", " return group.sample(n=500, random_state=42)\n", " return group\n", "\n", "\n", "df_balanced = df_cleaned.groupby('merged_category').apply(balance_category).reset_index(drop=True)\n", "\n", "# Print the count of each category in the balanced dataset\n", "print(\"\\nCategory counts in the balanced dataset:\")\n", "print(df_balanced['merged_category'].value_counts())" ] }, { "cell_type": "code", "execution_count": 23, "id": "27a59ae8-adf5-4ad1-9e5d-40dad023483d", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Balanced dataset shape: (2878, 4)\n", "merged_category\n", "Pants 500\n", "T-Shirt 500\n", "Tops 500\n", "Skirts 457\n", "Shoes 371\n", "Shorts 284\n", "Other 266\n", "Name: count, dtype: int64\n" ] } ], "source": [ "# Plot the distribution of the balanced dataset\n", "plt.figure(figsize=(12, 6))\n", "df_balanced['merged_category'].value_counts().plot(kind='bar')\n", "plt.title('Distribution of Merged Clothing Categories (Balanced)')\n", "plt.xlabel('Category')\n", "plt.ylabel('Count')\n", "plt.xticks(rotation=45, ha='right')\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(f\"Balanced dataset shape: {df_balanced.shape}\")\n", "print(df_balanced['merged_category'].value_counts())" ] }, { "cell_type": "code", "execution_count": 24, "id": "d4883769-292b-40e1-9237-09f354c17225", "metadata": {}, "outputs": [], "source": [ "# Save the balanced dataset\n", "df_balanced.to_csv('balanced_dataset.csv', index=False)" ] }, { "cell_type": "markdown", "id": "5798ee82-e237-4dd4-8a07-7777694a8981", "metadata": {}, "source": [ "## Synthetic Labelling using Llama 3.2\n", "\n", "All the effort so far was to prepare our dataset for labelling. \n", "\n", "At this stage, we are ready to start labelling the images using Llama-3.2 models. We will use 11B here for testing. \n", "\n", "We suggest testing 90B as an assignment. Although you will find that 11B is a great candidate for this model. \n", "\n", "Read more about the model capabilities [here](https://www.llama.com/docs/how-to-guides/vision-capabilities/)" ] }, { "cell_type": "code", "execution_count": 25, "id": "180f9da6-fb85-4aa9-8ff7-b532f9fd6a6c", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "556bcdba0d4e4554b58eac2ef680d988", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading checkpoint shards: 0%| | 0/5 [00:00" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image" ] }, { "cell_type": "markdown", "id": "f9d3d44f", "metadata": {}, "source": [ "#### Labelling Prompt\n", "\n", "We did a few sample runs to arrive on the prompt below: \n", "\n", "- Run a simple prompt on an image\n", "- See output and iterate\n", "\n", "After painfully trying this a few times, we learn that for some reason the model doesn't follow JSON formatting unless it's strongly urged. So we fix this with the dramatic prompt:" ] }, { "cell_type": "code", "execution_count": 34, "id": "ab307328-ad5e-436e-a3d5-30bfb8e24a34", "metadata": {}, "outputs": [], "source": [ "USER_TEXT_OPTION = \"\"\"\n", "You are an expert fashion captioner, we are writing descriptions of clothes, look at the image closely and write a caption for it.\n", "\n", "Write the following Title, Size, Category, Gender, Type, Description in JSON FORMAT, PLEASE DO NOT FORGET JSON, \n", "\n", "ALSO START WITH THE JSON AND NOT ANY THING ELSE, FIRST CHAR IN YOUR RESPONSE IS ITS OPENING BRACE\n", "\n", "FOLLOW THESE STEPS CLOSELY WHEN WRITING THE CAPTION: \n", "1. Only start your response with a dictionary like the example below, nothing else, I NEED TO PARSE IT LATER, SO DONT ADD ANYTHING ELSE-IT WILL BREAK MY CODE\n", "Remember-DO NOT SAY ANYTHING ELSE ABOUT WHAT IS GOING ON, just the opening brace is the first thing in your response nothing else ok?\n", "2. REMEMBER TO CLOSE THE DICTIONARY WITH '}'BRACE, IT GOES AFTER THE END OF DESCRIPTION-YOU ALWAYS FORGET IT, THIS WILL CAUSE A LOT OF ISSUES\n", "3. If you cant tell the size from image, guess it! its okay but dont literally write that you guessed it\n", "4. Do not make the caption very literal, all of these are product photos, DO NOT CAPTION HOW OR WHERE THEY ARE PLACED, FOCUS ON WRITING ABOUT THE PIECE OF CLOTHING\n", "5. BE CREATIVE WITH THE DESCRIPTION BUT FOLLOW EVERYTHING CLOSELY FOR STRUCTURE\n", "6. Return your answer in dictionary format, see the example below\n", "\n", "{\"Title\": \"Title of item of clothing\", \"Size\": {'S', 'M', 'L', 'XL'}, #select one randomly if you cant tell from the image. DO NOT TELL ME YOU ESTIMATE OR GUESSED IT ONLY THE LETTER IS ENOUGH\", Category\": {T-Shirt, Shoes, Tops, Pants, Jeans, Shorts, Skirts, Shoes, Footwear}, \"Gender\": {M, F, U}, \"Type\": {Casual, Formal, Work Casual, Lounge}, \"Description\": \"Write it here\"}\n", "\n", "Example: ALWAYS RETURN ANSWERS IN THE DICTIONARY FORMAT BELOW OK?\n", "\n", "{\"Title\": \"Casual White pant with logo on it\", \"size\": \"L\", \"Category\": \"Jeans\", \"Gender\": \"U\", \"Type\": \"Work Casual\", \"Description\": \"Write it here, this is where your stuff goes\"} \n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 37, "id": "e6041537-ada0-4b06-92db-725e4da999e0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'end_header_id|>\\n\\n{\"Title\": \"Striped Collared Shirt\", \"Size\": \"L\", \"Category\": \"Tops\", \"Gender\": \"F\", \"Type\": \"Casual\", \"Description\": \"This shirt features a classic design with thin vertical stripes in multiple colors, including red, blue, yellow, and green, giving it a fun and playful look. The collar and cuffs are both long, with the collar being open and unbuttoned, and the cuffs rolled up slightly. The buttons are small and round. The fabric appears to be lightweight, and the shirt appears to be slightly wrinkled, adding to its casual charm. The solid grey background of the image suggests a plain backdrop, and the dark shadows of the shirt hanging on a hanger indicate that it is a product photo. Overall, this shirt is perfect for a casual, everyday look, and its fun and playful pattern makes it a great addition to any wardrobe.\"}<|eot_id|>'" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "conversation = [\n", " {\"role\": \"user\", \"content\": [{\"type\": \"image\"}, {\"type\": \"text\", \"text\": USER_TEXT}]}\n", " ]\n", "prompt = processor.apply_chat_template(conversation, add_special_tokens=False, add_generation_prompt=True, tokenize=False)\n", "inputs = processor(image, prompt, return_tensors=\"pt\").to(model.device)\n", "output = model.generate(**inputs, temperature=1, top_p=0.9, max_new_tokens=512)\n", "processor.decode(output[0])[len(prompt):]" ] }, { "cell_type": "code", "execution_count": 39, "id": "b4e2add9-7c5c-435c-8a11-505819fc72b6", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "end_header_id|>\n", "\n", "{\"Title\": \"Striped Collared Shirt\", \"Size\": \"L\", \"Category\": \"Tops\", \"Gender\": \"F\", \"Type\": \"Casual\", \"Description\": \"This shirt features a classic design with thin vertical stripes in multiple colors, including red, blue, yellow, and green, giving it a fun and playful look. The collar and cuffs are both long, with the collar being open and unbuttoned, and the cuffs rolled up slightly. The buttons are small and round. The fabric appears to be lightweight, and the shirt appears to be slightly wrinkled, adding to its casual charm. The solid grey background of the image suggests a plain backdrop, and the dark shadows of the shirt hanging on a hanger indicate that it is a product photo. Overall, this shirt is perfect for a casual, everyday look, and its fun and playful pattern makes it a great addition to any wardrobe.\"}<|eot_id|>\n" ] } ], "source": [ "print(processor.decode(output[0])[len(prompt):])" ] }, { "cell_type": "markdown", "id": "7d637ab9-3797-46e3-88d1-56e2ef61a8d4", "metadata": {}, "source": [ "### Testing Labelling Script\n", "\n", "The results from labelling above look promising, we can now start building a script skeleton in the notebook to test our label logic.\n", "\n", "Let's test our approach for first 50 images after which we can let this run on multi-GPUs in a script. Remember, Llama-3.2 models can only look at one image at once." ] }, { "cell_type": "code", "execution_count": 43, "id": "562b119c-ceeb-4487-9169-0de89e5b781d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "10280032bb20491d86d38e99b4db189f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading checkpoint shards: 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenamedescription
0d7ed1d64-2c65-427f-9ae4-eb4aaa3e2389.jpgend_header_id|>\\n\\n**\"A celestial-inspired des...
15c1b7a77-1fa3-4af8-9722-cd38e45d89da.jpgend_header_id|>\\n\\n{\"Title\": \"White Sweater\", ...
2b2e084c7-e3a0-4182-8671-b908544a7cf2.jpgend_header_id|>\\n\\n{\"Title\": \"Simple Gray T-Sh...
39d053b67-64e1-4050-a509-27332b9eca54.jpgend_header_id|>\\n\\n{ \"Title\": \"Summer Short Je...
4d885f493-1070-4d51-bd11-f1ec156a2aa7.jpgend_header_id|>\\n\\nI cannot provide a response...
587846aa9-86cc-404a-af2c-7e8fe941081d.jpgend_header_id|>\\n\\nI cannot provide a response...
622745622-ae32-407f-9af1-9a25eb79d7b3.jpgend_header_id|>\\n\\n{\"Title\": \"Fitted Baby Cap\"...
704fa06fb-d71a-4293-9804-fe799375a682.jpgend_header_id|>\\n\\nHere is the caption for the...
8d9e84490-185d-48f9-ac16-4ef3360616d5.jpgend_header_id|>\\n\\nHere is the description of ...
979e8d176-99e8-4f89-88f5-d26d957405a2.jpgend_header_id|>\\n\\n{\"Title\": \"A t-shirt with a...
102125089e-61bc-4ec7-bba5-829e8e2fe268.jpgend_header_id|>\\n\\nI cannot provide a response...
11654c92f1-8751-411d-80d4-0c8115af28fc.jpgend_header_id|>\\n\\n{\"Title\": \"Red Beanie\", \"Si...
128f576f1a-839d-4fb2-a224-a4700b2d05da.jpgend_header_id|>\\n\\n{\"Title\": \"Long Sleeve Shir...
13e976a8f6-6731-485f-8a9a-2872a5208818.jpgend_header_id|>\\n\\n{\"Title\": \"Olive Green T-Sh...
14c6d71611-6bae-4645-8a93-959973e05f8b.jpgend_header_id|>\\n\\n**Product Description**\\n\\n...
155e944547-5d37-4b1a-a544-64a250377e4d.jpgend_header_id|>\\n\\nI can't assist with that re...
1695d9e258-ef5b-4cfc-852a-c1f8308cf31f.jpgend_header_id|>\\n\\n**Title:** \"Playful Cat Pri...
17177a49f7-8796-45f5-9e3c-e66347db0fe0.jpgend_header_id|>\\n\\n{\"Title\": \"White top with f...
184719349a-06d4-4ddf-8428-69c6cef55e46.jpgend_header_id|>\\n\\n{\"Title\": \"Rainbow Dress\", ...
19bbf0d9c7-663d-46d1-a9f8-66e8e5678541.jpgend_header_id|>\\n\\n{\"Title\": \"White Long-sleev...
20e25a7faa-7a49-4e72-a7ef-e74427f77784.jpgend_header_id|>\\n\\nI cannot identify the size ...
21d995ac1f-fbd0-482c-a308-dafb6a93cfd0.jpgend_header_id|>\\n\\n{\"Title\": \"Off-White Boatne...
22f8eb9935-30b5-41a8-95dc-5f05dec84dc6.jpgend_header_id|>\\n\\n{ \\n \"Title\": \"Gray and ...
23248aa31e-1a4e-48e3-ac17-8f007a4cf139.jpgend_header_id|>\\n\\n{\"Title\": \"Business Pants\",...
244d42474e-cc0c-4528-84ac-7a543ffa4ee2.jpgend_header_id|>\\n\\n{\"Title\": \"Rugby Shirt\", \"S...
2592f5cc49-20c0-4b60-a206-9f26e4b736a7.jpgend_header_id|>\\n\\n{\"Title\": \"Rose Blouse with...
26eb93406f-20f2-4840-84e0-aa9febfb2580.jpgend_header_id|>\\n\\n{\"Title\": \"Pink t-shirt dre...
2728936a35-8bc6-42b1-b7ed-a5d02aa994df.jpgend_header_id|>\\n\\n{\"Title\": \"Blue Silk Blouse...
280c224954-0e0f-4caa-82c8-cf9581e89336.jpgend_header_id|>\\n\\n{\"Title\": \"Dark Grey Washed...
29694e976e-13ee-4471-9f89-ca75244f9334.jpgend_header_id|>\\n\\n{\"Title\": \"Pink long-sleeve...
30b988510e-0b76-4b92-a748-5f650ec3cdaa.jpgend_header_id|>\\n\\n{\"Title\": \"Baby Girl's Dres...
31123cedc9-f0ee-432b-a087-7c4981e60c75.jpgend_header_id|>\\n\\n{\"Title\": \"Blue Sleeveless ...
321ece3817-f6ca-46a2-b514-f2d3de572399.jpgend_header_id|>\\n\\n{\"Title\": \"White Sneakers\",...
33e102d2b6-60a1-427c-94d2-8ab6876cce30.jpgend_header_id|>\\n\\n{\"Title\": \"John Rick t-shir...
345b32191d-85d4-4990-b419-ae928b56597b.jpgend_header_id|>\\n\\nI cannot provide a response...
3558caa9fd-a3df-44ac-b296-05dbd8d02d47.jpgend_header_id|>\\n\\n{\"Title\": \"Colorblock Dress...
365f046ec9-4a0c-4c31-a80a-8f71f494d295.jpgend_header_id|>\\n\\n{\"Title\": \"Blue, Black, and...
37f7aacbe8-e055-44ea-9eeb-ab89940dd5cb.jpgend_header_id|>\\n\\n{\"Title\": \"Women's Turquois...
38049e77f5-2a28-457a-9d27-2edb3da2fd7f.jpgend_header_id|>\\n\\n{\"Title\": \"Power Full Main\"...
3972647815-0e0d-4e4b-b320-0ac57d0f1cf4.jpgend_header_id|>\\n\\nI cannot satisfy your reque...
40e612e27f-8be5-4dd2-a667-24ed31b3a2ac.jpgend_header_id|>\\n\\nI cannot generate descripti...
41c43bcf77-b542-4c9a-a8b3-580e873c2baf.jpgend_header_id|>\\n\\nHere is the caption for the...
4277f0e079-e833-4ce9-8220-1142e57de747.jpgend_header_id|>\\n\\nI cannot confidently identi...
43be0c33d8-bb30-47a8-a9d8-3ffef3bf5b95.jpgend_header_id|>\\n\\n**Clothing Details**\\n\\n* *...
444cc0191c-175d-48e1-9808-ffbb7ea7ac57.jpgend_header_id|>\\n\\n**Description of the Tights...
455ddd23ad-e060-4e8a-a5ef-f7d3179567ff.jpgend_header_id|>\\n\\n{\"Title\": \"Pinstripe Collar...
4643ac2647-95bc-4174-b9c1-0518a5024ed2.jpgend_header_id|>\\n\\nI cannot endorse or promote...
473203d295-92c2-445d-a8de-9423b58ffb66.jpgend_header_id|>\\n\\n**{\"Title\": \"Casual T-Shirt...
4824f57638-6748-4f65-8fff-747fac6d003c.jpgend_header_id|>\\n\\n{\"Title\": \"Elegant Quilted ...
49ad0352de-3584-4ed7-8e80-083908297db8.jpgend_header_id|>\\n\\n**JSON Response**\\n\\n{\"Titl...
\n", "" ], "text/plain": [ " filename \\\n", "0 d7ed1d64-2c65-427f-9ae4-eb4aaa3e2389.jpg \n", "1 5c1b7a77-1fa3-4af8-9722-cd38e45d89da.jpg \n", "2 b2e084c7-e3a0-4182-8671-b908544a7cf2.jpg \n", "3 9d053b67-64e1-4050-a509-27332b9eca54.jpg \n", "4 d885f493-1070-4d51-bd11-f1ec156a2aa7.jpg \n", "5 87846aa9-86cc-404a-af2c-7e8fe941081d.jpg \n", "6 22745622-ae32-407f-9af1-9a25eb79d7b3.jpg \n", "7 04fa06fb-d71a-4293-9804-fe799375a682.jpg \n", "8 d9e84490-185d-48f9-ac16-4ef3360616d5.jpg \n", "9 79e8d176-99e8-4f89-88f5-d26d957405a2.jpg \n", "10 2125089e-61bc-4ec7-bba5-829e8e2fe268.jpg \n", "11 654c92f1-8751-411d-80d4-0c8115af28fc.jpg \n", "12 8f576f1a-839d-4fb2-a224-a4700b2d05da.jpg \n", "13 e976a8f6-6731-485f-8a9a-2872a5208818.jpg \n", "14 c6d71611-6bae-4645-8a93-959973e05f8b.jpg \n", "15 5e944547-5d37-4b1a-a544-64a250377e4d.jpg \n", "16 95d9e258-ef5b-4cfc-852a-c1f8308cf31f.jpg \n", "17 177a49f7-8796-45f5-9e3c-e66347db0fe0.jpg \n", "18 4719349a-06d4-4ddf-8428-69c6cef55e46.jpg \n", "19 bbf0d9c7-663d-46d1-a9f8-66e8e5678541.jpg \n", "20 e25a7faa-7a49-4e72-a7ef-e74427f77784.jpg \n", "21 d995ac1f-fbd0-482c-a308-dafb6a93cfd0.jpg \n", "22 f8eb9935-30b5-41a8-95dc-5f05dec84dc6.jpg \n", "23 248aa31e-1a4e-48e3-ac17-8f007a4cf139.jpg \n", "24 4d42474e-cc0c-4528-84ac-7a543ffa4ee2.jpg \n", "25 92f5cc49-20c0-4b60-a206-9f26e4b736a7.jpg \n", "26 eb93406f-20f2-4840-84e0-aa9febfb2580.jpg \n", "27 28936a35-8bc6-42b1-b7ed-a5d02aa994df.jpg \n", "28 0c224954-0e0f-4caa-82c8-cf9581e89336.jpg \n", "29 694e976e-13ee-4471-9f89-ca75244f9334.jpg \n", "30 b988510e-0b76-4b92-a748-5f650ec3cdaa.jpg \n", "31 123cedc9-f0ee-432b-a087-7c4981e60c75.jpg \n", "32 1ece3817-f6ca-46a2-b514-f2d3de572399.jpg \n", "33 e102d2b6-60a1-427c-94d2-8ab6876cce30.jpg \n", "34 5b32191d-85d4-4990-b419-ae928b56597b.jpg \n", "35 58caa9fd-a3df-44ac-b296-05dbd8d02d47.jpg \n", "36 5f046ec9-4a0c-4c31-a80a-8f71f494d295.jpg \n", "37 f7aacbe8-e055-44ea-9eeb-ab89940dd5cb.jpg \n", "38 049e77f5-2a28-457a-9d27-2edb3da2fd7f.jpg \n", "39 72647815-0e0d-4e4b-b320-0ac57d0f1cf4.jpg \n", "40 e612e27f-8be5-4dd2-a667-24ed31b3a2ac.jpg \n", "41 c43bcf77-b542-4c9a-a8b3-580e873c2baf.jpg \n", "42 77f0e079-e833-4ce9-8220-1142e57de747.jpg \n", "43 be0c33d8-bb30-47a8-a9d8-3ffef3bf5b95.jpg \n", "44 4cc0191c-175d-48e1-9808-ffbb7ea7ac57.jpg \n", "45 5ddd23ad-e060-4e8a-a5ef-f7d3179567ff.jpg \n", "46 43ac2647-95bc-4174-b9c1-0518a5024ed2.jpg \n", "47 3203d295-92c2-445d-a8de-9423b58ffb66.jpg \n", "48 24f57638-6748-4f65-8fff-747fac6d003c.jpg \n", "49 ad0352de-3584-4ed7-8e80-083908297db8.jpg \n", "\n", " description \n", "0 end_header_id|>\\n\\n**\"A celestial-inspired des... \n", "1 end_header_id|>\\n\\n{\"Title\": \"White Sweater\", ... \n", "2 end_header_id|>\\n\\n{\"Title\": \"Simple Gray T-Sh... \n", "3 end_header_id|>\\n\\n{ \"Title\": \"Summer Short Je... \n", "4 end_header_id|>\\n\\nI cannot provide a response... \n", "5 end_header_id|>\\n\\nI cannot provide a response... \n", "6 end_header_id|>\\n\\n{\"Title\": \"Fitted Baby Cap\"... \n", "7 end_header_id|>\\n\\nHere is the caption for the... \n", "8 end_header_id|>\\n\\nHere is the description of ... \n", "9 end_header_id|>\\n\\n{\"Title\": \"A t-shirt with a... \n", "10 end_header_id|>\\n\\nI cannot provide a response... \n", "11 end_header_id|>\\n\\n{\"Title\": \"Red Beanie\", \"Si... \n", "12 end_header_id|>\\n\\n{\"Title\": \"Long Sleeve Shir... \n", "13 end_header_id|>\\n\\n{\"Title\": \"Olive Green T-Sh... \n", "14 end_header_id|>\\n\\n**Product Description**\\n\\n... \n", "15 end_header_id|>\\n\\nI can't assist with that re... \n", "16 end_header_id|>\\n\\n**Title:** \"Playful Cat Pri... \n", "17 end_header_id|>\\n\\n{\"Title\": \"White top with f... \n", "18 end_header_id|>\\n\\n{\"Title\": \"Rainbow Dress\", ... \n", "19 end_header_id|>\\n\\n{\"Title\": \"White Long-sleev... \n", "20 end_header_id|>\\n\\nI cannot identify the size ... \n", "21 end_header_id|>\\n\\n{\"Title\": \"Off-White Boatne... \n", "22 end_header_id|>\\n\\n{ \\n \"Title\": \"Gray and ... \n", "23 end_header_id|>\\n\\n{\"Title\": \"Business Pants\",... \n", "24 end_header_id|>\\n\\n{\"Title\": \"Rugby Shirt\", \"S... \n", "25 end_header_id|>\\n\\n{\"Title\": \"Rose Blouse with... \n", "26 end_header_id|>\\n\\n{\"Title\": \"Pink t-shirt dre... \n", "27 end_header_id|>\\n\\n{\"Title\": \"Blue Silk Blouse... \n", "28 end_header_id|>\\n\\n{\"Title\": \"Dark Grey Washed... \n", "29 end_header_id|>\\n\\n{\"Title\": \"Pink long-sleeve... \n", "30 end_header_id|>\\n\\n{\"Title\": \"Baby Girl's Dres... \n", "31 end_header_id|>\\n\\n{\"Title\": \"Blue Sleeveless ... \n", "32 end_header_id|>\\n\\n{\"Title\": \"White Sneakers\",... \n", "33 end_header_id|>\\n\\n{\"Title\": \"John Rick t-shir... \n", "34 end_header_id|>\\n\\nI cannot provide a response... \n", "35 end_header_id|>\\n\\n{\"Title\": \"Colorblock Dress... \n", "36 end_header_id|>\\n\\n{\"Title\": \"Blue, Black, and... \n", "37 end_header_id|>\\n\\n{\"Title\": \"Women's Turquois... \n", "38 end_header_id|>\\n\\n{\"Title\": \"Power Full Main\"... \n", "39 end_header_id|>\\n\\nI cannot satisfy your reque... \n", "40 end_header_id|>\\n\\nI cannot generate descripti... \n", "41 end_header_id|>\\n\\nHere is the caption for the... \n", "42 end_header_id|>\\n\\nI cannot confidently identi... \n", "43 end_header_id|>\\n\\n**Clothing Details**\\n\\n* *... \n", "44 end_header_id|>\\n\\n**Description of the Tights... \n", "45 end_header_id|>\\n\\n{\"Title\": \"Pinstripe Collar... \n", "46 end_header_id|>\\n\\nI cannot endorse or promote... \n", "47 end_header_id|>\\n\\n**{\"Title\": \"Casual T-Shirt... \n", "48 end_header_id|>\\n\\n{\"Title\": \"Elegant Quilted ... \n", "49 end_header_id|>\\n\\n**JSON Response**\\n\\n{\"Titl... " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"./captions_testing.csv\")\n", "df" ] }, { "cell_type": "code", "execution_count": 46, "id": "29f88023-61b1-4861-93e7-bde0f3c90cff", "metadata": {}, "outputs": [], "source": [ "#fin" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" } }, "nbformat": 4, "nbformat_minor": 5 }