|
11 tháng trước cách đây | |
---|---|---|
.. | ||
notebooks | 1 năm trước cách đây | |
scripts | 1 năm trước cách đây | |
README.md | 11 tháng trước cách đây |
Llama-3.2-11B
model:Llama-3.2-11B-Vision-Instruct
modelHere's the detailed outline:
The dataset consists of 5000 images with some meta-data.
The first half is preparing the dataset for labeling:
Second Half consists of Labeling the dataset. We are bound by an interesting constraint here, 11B model can only caption one image at a time:
After running the script on the entire dataset, we have more data cleaning to perform:
Even after our lengthy (apart from other things) prompt, the model still hallucinates categories and label-we need to address this
Now, we are ready to try our vector db pipeline:
With the cleaned descriptions and dataset, we can now store these in a vector-db
You will note that we are not using the categorization from our model-this is by design to show how RAG can simplify a lot of things.
We try the approach with different retrieval methods.
Finally, we can bring this all together in a Gradio App.
Task: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
Credit and Thanks to List of models and resources used in the showcase:
Firstly, thanks to the author here for providing this dataset on which we base our exercise []()