|
@@ -0,0 +1,424 @@
|
|
|
+{
|
|
|
+ "cells": [
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "# Extracting Structured Data from Images\n",
|
|
|
+ "[](https://colab.research.google.com/github/togethercomputer/together-cookbook/blob/main/Structured_Text_Extraction_from_Images.ipynb)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## Introduction\n",
|
|
|
+ "\n",
|
|
|
+ "In this notebook we will demonstrate how you can use a language vision model(Llama 3.2 90B Vision) along with an LLM that has JSON mode enabled(Llama 3.1 70B) to extract structured text from images.\n",
|
|
|
+ "\n",
|
|
|
+ "In our case we will extract line items from an invoice in the form of a JSON.\n",
|
|
|
+ "\n",
|
|
|
+ "<img src=\"images\\structured_text_image.png\" width=\"750\">\n"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "### Install relevant libraries"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": 1,
|
|
|
+ "metadata": {
|
|
|
+ "colab": {
|
|
|
+ "base_uri": "https://localhost:8080/"
|
|
|
+ },
|
|
|
+ "id": "BdXwZoEej4cy",
|
|
|
+ "outputId": "83495aba-13dd-4bf9-8e65-54d44137d4ad"
|
|
|
+ },
|
|
|
+ "outputs": [
|
|
|
+ {
|
|
|
+ "name": "stdout",
|
|
|
+ "output_type": "stream",
|
|
|
+ "text": [
|
|
|
+ "Collecting together\n",
|
|
|
+ " Downloading together-1.3.3-py3-none-any.whl.metadata (11 kB)\n",
|
|
|
+ "Requirement already satisfied: aiohttp<4.0.0,>=3.9.3 in /usr/local/lib/python3.10/dist-packages (from together) (3.10.10)\n",
|
|
|
+ "Requirement already satisfied: click<9.0.0,>=8.1.7 in /usr/local/lib/python3.10/dist-packages (from together) (8.1.7)\n",
|
|
|
+ "Requirement already satisfied: eval-type-backport<0.3.0,>=0.1.3 in /usr/local/lib/python3.10/dist-packages (from together) (0.2.0)\n",
|
|
|
+ "Requirement already satisfied: filelock<4.0.0,>=3.13.1 in /usr/local/lib/python3.10/dist-packages (from together) (3.16.1)\n",
|
|
|
+ "Requirement already satisfied: numpy>=1.23.5 in /usr/local/lib/python3.10/dist-packages (from together) (1.26.4)\n",
|
|
|
+ "Requirement already satisfied: pillow<11.0.0,>=10.3.0 in /usr/local/lib/python3.10/dist-packages (from together) (10.4.0)\n",
|
|
|
+ "Requirement already satisfied: pyarrow>=10.0.1 in /usr/local/lib/python3.10/dist-packages (from together) (16.1.0)\n",
|
|
|
+ "Requirement already satisfied: pydantic<3.0.0,>=2.6.3 in /usr/local/lib/python3.10/dist-packages (from together) (2.9.2)\n",
|
|
|
+ "Requirement already satisfied: requests<3.0.0,>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from together) (2.32.3)\n",
|
|
|
+ "Requirement already satisfied: rich<14.0.0,>=13.8.1 in /usr/local/lib/python3.10/dist-packages (from together) (13.9.2)\n",
|
|
|
+ "Requirement already satisfied: tabulate<0.10.0,>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from together) (0.9.0)\n",
|
|
|
+ "Requirement already satisfied: tqdm<5.0.0,>=4.66.2 in /usr/local/lib/python3.10/dist-packages (from together) (4.66.5)\n",
|
|
|
+ "Requirement already satisfied: typer<0.13,>=0.9 in /usr/local/lib/python3.10/dist-packages (from together) (0.12.5)\n",
|
|
|
+ "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (2.4.3)\n",
|
|
|
+ "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (1.3.1)\n",
|
|
|
+ "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (24.2.0)\n",
|
|
|
+ "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (1.4.1)\n",
|
|
|
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (6.1.0)\n",
|
|
|
+ "Requirement already satisfied: yarl<2.0,>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (1.15.2)\n",
|
|
|
+ "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (4.0.3)\n",
|
|
|
+ "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.6.3->together) (0.7.0)\n",
|
|
|
+ "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.6.3->together) (2.23.4)\n",
|
|
|
+ "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.6.3->together) (4.12.2)\n",
|
|
|
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (3.4.0)\n",
|
|
|
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (3.10)\n",
|
|
|
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (2.2.3)\n",
|
|
|
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (2024.8.30)\n",
|
|
|
+ "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich<14.0.0,>=13.8.1->together) (3.0.0)\n",
|
|
|
+ "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich<14.0.0,>=13.8.1->together) (2.18.0)\n",
|
|
|
+ "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from typer<0.13,>=0.9->together) (1.5.4)\n",
|
|
|
+ "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich<14.0.0,>=13.8.1->together) (0.1.2)\n",
|
|
|
+ "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from yarl<2.0,>=1.12.0->aiohttp<4.0.0,>=3.9.3->together) (0.2.0)\n",
|
|
|
+ "Downloading together-1.3.3-py3-none-any.whl (68 kB)\n",
|
|
|
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m68.1/68.1 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
|
|
+ "\u001b[?25hInstalling collected packages: together\n",
|
|
|
+ "Successfully installed together-1.3.3\n"
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ ],
|
|
|
+ "source": [
|
|
|
+ "!pip install together"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": 4,
|
|
|
+ "metadata": {
|
|
|
+ "id": "HVTULp3InIOW"
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "import together, os\n",
|
|
|
+ "\n",
|
|
|
+ "# Paste in your Together AI API Key or load it\n",
|
|
|
+ "TOGETHER_API_KEY = os.environ.get(\"TOGETHER_API_KEY\")"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## Create Invoice Structure using Pydantic\n",
|
|
|
+ "\n",
|
|
|
+ "We need a way of telling the LLM what structure to organize information into - including what information to expect in the receipt. We will do this using `pydantic` models.\n",
|
|
|
+ "\n",
|
|
|
+ "Below we define the required classes.\n",
|
|
|
+ "\n",
|
|
|
+ "- Each line item on the receipt will have a `name`, `price` and `quantity`. The `Item` class specifies this.\n",
|
|
|
+ "- Each receipt/invoice is a combination of multiple line `Item` elements along with a `total` price. The `Receipt` class specifies this."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": 12,
|
|
|
+ "metadata": {
|
|
|
+ "id": "iY7Xe0Bjk_Zc"
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "import json\n",
|
|
|
+ "from pydantic import BaseModel, Field\n",
|
|
|
+ "\n",
|
|
|
+ "class Item(BaseModel):\n",
|
|
|
+ " name: str\n",
|
|
|
+ " price: float\n",
|
|
|
+ " quantity: int = Field(default=1)\n",
|
|
|
+ "\n",
|
|
|
+ "class Receipt(BaseModel):\n",
|
|
|
+ " items: list[Item]\n",
|
|
|
+ " total: float"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {
|
|
|
+ "id": "8aPkxE7MnbkX"
|
|
|
+ },
|
|
|
+ "source": [
|
|
|
+ "## Lets bring in the reciept that we want to extract information from\n",
|
|
|
+ "\n",
|
|
|
+ "Notice that this is a real receipt with multiple portions that are not relevant to the line item extraction structure we've outlined above.\n",
|
|
|
+ "\n",
|
|
|
+ "<img src=\"https://ocr.space/Content/Images/receipt-ocr-original.webp\" height=\"500\">"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## 1. Extract Information Receipt\n",
|
|
|
+ "\n",
|
|
|
+ "We will use the Llama 3.2 90B Vision model to extract out information in normal text format."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": 16,
|
|
|
+ "metadata": {
|
|
|
+ "id": "Z8MegIo-lG5s"
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "from together import Together\n",
|
|
|
+ "\n",
|
|
|
+ "getDescriptionPrompt = \"Extract out the details from each line item on the receipt image. Identify the name, price and quantity of each item. Also specify the total.\"\n",
|
|
|
+ "\n",
|
|
|
+ "imageUrl = \"https://ocr.space/Content/Images/receipt-ocr-original.webp\"\n",
|
|
|
+ "\n",
|
|
|
+ "client = Together(api_key=TOGETHER_API_KEY)\n",
|
|
|
+ "\n",
|
|
|
+ "response = client.chat.completions.create(\n",
|
|
|
+ " model=\"meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo\",\n",
|
|
|
+ " messages=[\n",
|
|
|
+ " {\n",
|
|
|
+ " \"role\": \"user\",\n",
|
|
|
+ " \"content\": [\n",
|
|
|
+ " {\"type\": \"text\", \"text\": getDescriptionPrompt},\n",
|
|
|
+ " {\n",
|
|
|
+ " \"type\": \"image_url\",\n",
|
|
|
+ " \"image_url\": {\n",
|
|
|
+ " \"url\": imageUrl,\n",
|
|
|
+ " },\n",
|
|
|
+ " },\n",
|
|
|
+ " ],\n",
|
|
|
+ " }\n",
|
|
|
+ " ],\n",
|
|
|
+ ")\n",
|
|
|
+ "\n",
|
|
|
+ "info = response.choices[0].message.content"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": 17,
|
|
|
+ "metadata": {
|
|
|
+ "colab": {
|
|
|
+ "base_uri": "https://localhost:8080/"
|
|
|
+ },
|
|
|
+ "id": "mcpfNLWaoGWX",
|
|
|
+ "outputId": "f07bcaa0-5521-43e9-d614-4d3405c6e20e"
|
|
|
+ },
|
|
|
+ "outputs": [
|
|
|
+ {
|
|
|
+ "name": "stdout",
|
|
|
+ "output_type": "stream",
|
|
|
+ "text": [
|
|
|
+ "The receipt shows a total of 17 line items. The details for each item are as follows:\n",
|
|
|
+ "\n",
|
|
|
+ "1. Pet Toy: $1.97, quantity - 1\n",
|
|
|
+ "2. Floppy Puppy: $1.97, quantity - 1\n",
|
|
|
+ "3. Sssupreme S: $4.97, quantity - 1\n",
|
|
|
+ "4. 2.5 Squeak: $5.92, quantity - 1\n",
|
|
|
+ "5. Munchy Dmbel: $3.77, quantity - 1\n",
|
|
|
+ "6. Dog Treat: $2.92, quantity - 1\n",
|
|
|
+ "7. Ped Pch 1: $0.50, quantity - 1 (x2)\n",
|
|
|
+ "8. Coupon: $1.00, quantity - 1\n",
|
|
|
+ "9. Hnymd Smores: $3.98, quantity - 1\n",
|
|
|
+ "10. French Drsng: $1.98, quantity - 1\n",
|
|
|
+ "11. 3 Oranges: $5.47, quantity - 1\n",
|
|
|
+ "12. Baby Carrots: $1.48, quantity - 1\n",
|
|
|
+ "13. Collards: $1.24, quantity - 1\n",
|
|
|
+ "14. Calzone: $2.50, quantity - 1\n",
|
|
|
+ "15. Mm Rvw Mnt: $19.77, quantity - 1\n",
|
|
|
+ "16. Stkobrlplabl: $1.97, quantity - 1 (x6)\n",
|
|
|
+ "17. Dry Dog: $12.44, quantity - 1\n",
|
|
|
+ "\n",
|
|
|
+ "The total is $98.21.\n"
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ ],
|
|
|
+ "source": [
|
|
|
+ "print(info)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Notice that the model is not perfect and wasn't able to extract out some line items. It's hard for most models to perform this zero-shot extraction of data from images. A way to improve this is to finetune the model using [Visual Intruction Tuning](https://arxiv.org/abs/2304.08485)."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## 2. Organize Information as JSON\n",
|
|
|
+ "\n",
|
|
|
+ "We will use Llama 3.1 70B with structured generation in JSON mode to organize the information extracted by the vision model into an acceptable JSON format that can be parsed.\n",
|
|
|
+ "\n",
|
|
|
+ "`Meta-Llama-3.1-70B-Instruct-Turbo` will strcitly respect the JSON schema passed to it."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": 18,
|
|
|
+ "metadata": {
|
|
|
+ "id": "kdGlyO8hnD7v"
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "extract = client.chat.completions.create(\n",
|
|
|
+ " messages=[\n",
|
|
|
+ " {\n",
|
|
|
+ " \"role\": \"system\",\n",
|
|
|
+ " \"content\": \"The following is a detailed description of all the items, prices and quantities on a receipt. Extract out information. Only answer in JSON.\",\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"role\": \"user\",\n",
|
|
|
+ " \"content\": info,\n",
|
|
|
+ " },\n",
|
|
|
+ " ],\n",
|
|
|
+ " model=\"meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo\",\n",
|
|
|
+ " response_format={\n",
|
|
|
+ " \"type\": \"json_object\",\n",
|
|
|
+ " \"schema\": Receipt.model_json_schema(),\n",
|
|
|
+ " },\n",
|
|
|
+ " )"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": 19,
|
|
|
+ "metadata": {
|
|
|
+ "colab": {
|
|
|
+ "base_uri": "https://localhost:8080/"
|
|
|
+ },
|
|
|
+ "id": "037sAvDRoaB4",
|
|
|
+ "outputId": "fb058f0f-5a6a-4dd0-a332-7441b98e862f"
|
|
|
+ },
|
|
|
+ "outputs": [
|
|
|
+ {
|
|
|
+ "name": "stdout",
|
|
|
+ "output_type": "stream",
|
|
|
+ "text": [
|
|
|
+ "{\n",
|
|
|
+ " \"items\": [\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Pet Toy\",\n",
|
|
|
+ " \"price\": 1.97,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Floppy Puppy\",\n",
|
|
|
+ " \"price\": 1.97,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Sssupreme S\",\n",
|
|
|
+ " \"price\": 4.97,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"2.5 Squeak\",\n",
|
|
|
+ " \"price\": 5.92,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Munchy Dmbel\",\n",
|
|
|
+ " \"price\": 3.77,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Dog Treat\",\n",
|
|
|
+ " \"price\": 2.92,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Ped Pch 1\",\n",
|
|
|
+ " \"price\": 0.5,\n",
|
|
|
+ " \"quantity\": 2\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Coupon\",\n",
|
|
|
+ " \"price\": -1.0,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Hnymd Smores\",\n",
|
|
|
+ " \"price\": 3.98,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"French Drsng\",\n",
|
|
|
+ " \"price\": 1.98,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"3 Oranges\",\n",
|
|
|
+ " \"price\": 5.47,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Baby Carrots\",\n",
|
|
|
+ " \"price\": 1.48,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Collards\",\n",
|
|
|
+ " \"price\": 1.24,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Calzone\",\n",
|
|
|
+ " \"price\": 2.5,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Mm Rvw Mnt\",\n",
|
|
|
+ " \"price\": 19.77,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Stkobrlplabl\",\n",
|
|
|
+ " \"price\": 1.97,\n",
|
|
|
+ " \"quantity\": 6\n",
|
|
|
+ " },\n",
|
|
|
+ " {\n",
|
|
|
+ " \"name\": \"Dry Dog\",\n",
|
|
|
+ " \"price\": 12.44,\n",
|
|
|
+ " \"quantity\": 1\n",
|
|
|
+ " }\n",
|
|
|
+ " ],\n",
|
|
|
+ " \"total\": 98.21\n",
|
|
|
+ "}\n"
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ ],
|
|
|
+ "source": [
|
|
|
+ "output = json.loads(extract.choices[0].message.content)\n",
|
|
|
+ "print(json.dumps(output, indent=2))"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Althought with some missed line items we were able to extract out structured JSON from an image in a zero shot manner! To improve the results for your pipeline and make them production ready I recommend you [finetune](https://docs.together.ai/docs/fine-tuning-overview) the vision model on your own dataset!\n",
|
|
|
+ "\n",
|
|
|
+ "Learn more about how to use JSON mode in the [docs](https://docs.together.ai/docs/json-mode) here!"
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ ],
|
|
|
+ "metadata": {
|
|
|
+ "colab": {
|
|
|
+ "provenance": []
|
|
|
+ },
|
|
|
+ "kernelspec": {
|
|
|
+ "display_name": "Python 3",
|
|
|
+ "name": "python3"
|
|
|
+ },
|
|
|
+ "language_info": {
|
|
|
+ "name": "python"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "nbformat": 4,
|
|
|
+ "nbformat_minor": 0
|
|
|
+}
|