{ "cells": [ { "cell_type": "markdown", "id": "616da569", "metadata": {}, "source": [ "# Metadata Extraction from Invoice Documents" ] }, { "cell_type": "markdown", "id": "feee55a0", "metadata": {}, "source": [ "*Copyright (c) Meta Platforms, Inc. and affiliates.\n", "This software may be used and distributed according to the terms of the Llama Community License Agreement.*" ] }, { "cell_type": "markdown", "id": "ed884233", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "2df08514-3d42-4581-8927-137e3728fb7d", "metadata": {}, "source": [ "The tutorial shows you how to build an invoice processing system that automatically extracts structured data from invoice images. Using a two-stage pipeline with Llama's multimodal and text models, you'll transform varied invoice formats into clean, validated JSON data ready for seamless import into accounting systems.\n", "\n", "While traditional OCR tools struggle with diverse layouts and require extensive template configuration, the approach here uses Llama's vision capabilities to understand any invoice format, enrich data with external services, and flag exceptions for human review—delivering the accuracy required for financial automation.\n", "\n", "## What you will learn\n", "\n", "- **Build a two-stage processing pipeline** that separates visual extraction from intelligent refinement for optimal accuracy and cost efficiency.\n", "- **Leverage Llama's multimodal capabilities** to extract data from invoice images with diverse layouts and formats.\n", "- **Use Llama API's tool calling** to enrich data with external services such as currency conversion APIs.\n", "- **Implement JSON structured output** to ensure consistent, reliable data extraction every time.\n", "\n", "| Component | Choice | Why |\n", "|:----------|:-------|:----|\n", "| **Architecture** | Two-Stage Pipeline | Separates concerns: Stage 1 focuses on accurate transcription, Stage 2 on refinement |\n", "| **Stage 1 Model** | Llama 4 Maverick | Advanced vision capabilities for accurate text extraction from complex invoice layouts |\n", "| **Stage 2 Model** | Llama 4 Scout | Fast performance and tool calling for data refinement, validation, and enrichment |\n", "| **Infrastructure** | Llama API | Provides serverless, production-ready access to Llama models using the `llama_api_client` software development kit (SDK) |\n", "| **Output Format** | JSON Structured Output | Guarantees consistent schema compliance for seamless integration with accounting systems |\n", "\n", "---\n", "\n", "**Note on Inference Providers:** This tutorial uses Llama API for demonstration purposes. However, you can run Llama models with any preferred inference provider. Common examples include [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) and [Together AI](https://together.ai/llama). The core logic of this tutorial can be adapted to any of these providers." ] }, { "cell_type": "markdown", "id": "688d4934-1279-4301-a49a-1ca3021d44bb", "metadata": {}, "source": [ "## Scenario snapshot\n", "\n", "- **Corpus:** The **Scanned Receipts OCR and Information Extraction (SROIE) dataset**, publicly available on [Kaggle](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2). The dataset contains real-world receipts and invoices featuring diverse layouts, fonts, and formats that realistically simulate the challenges an AP department faces. Documents range from simple receipts to complex invoices with varying quality—from pristine digital exports to faded scans.\n", "\n", "**Example Invoice from SROIE Dataset:**\n", "\n", "![Sample Invoice](data/invoice_img/X51005230648.jpg)\n", "\n", "*Sample Malaysian invoice showing typical extraction challenges: foreign currency (RM), mixed date formats (29/01/2018), vendor information, and varying text quality.*\n", "\n", "- **Users:** The target users are **Accounts Payable (AP) specialists** and **accounting clerks** whose primary responsibility is to process incoming invoices for payment accurately and efficiently. They typically spend 3-5 minutes per invoice on manual data entry, leading to processing bottlenecks and human errors.\n", "\n", "- **Typical Asks:** The core task is to extract structured data from each invoice document. A typical extraction task would be:\n", " - Extract the following from the invoice: vendor_name, invoice_date, vendor_address, total_amount, and currency_symbol\n", " - Convert any foreign currency amounts to USD using current exchange rates\n", " - Flag any invoices with missing required fields or validation errors for manual review\n", "\n", "## Solution: Two-stage intelligent processing pipeline\n", "\n", "The solution is a two-stage pipeline that optimizes for both accuracy and cost by using different Llama models specialized for each stage's task:\n", "\n", "```mermaid\n", "flowchart TD\n", " A[Invoice Image] --> B[Stage 1: Accurate Transcription]\n", " B -- Raw JSON with Ambiguities --> C[Stage 2: Intelligent Refinement]\n", " C -- Enriched & Validated Data --> E[Final Structured JSON]\n", " \n", " subgraph \"Multimodal Llama Model\"\n", " B\n", " end\n", " \n", " subgraph \"Text Refinement\"\n", " C\n", " end\n", " \n", " D[External Tools, e.g., Currency API] <--> C\n", "```\n", "\n", "### Stage 1: Accurate transcription (vision to raw data)\n", "\n", "The first stage acts as the system's \"eyes,\" focusing entirely on converting visual information into structured text with maximum accuracy.\n", "\n", "**Key Strategy**: The model is instructed to **extract, not invent**. It captures any visual ambiguities in the `extraction_notes` field (e.g., \"Currency symbol appears to be 'RM' but could be 'SR' due to print quality\"), creating a raw but faithful digital representation with documented uncertainties.\n", "\n", "### Stage 2: Intelligent refinement (data to insights)\n", "\n", "The second stage acts as the system's \"brain,\" applying business logic, external knowledge, and validation rules to produce clean, reliable data.\n", "\n", "The separation of concerns enables you to use expensive multimodal models only for visual tasks while leveraging faster, cheaper text models for data refinement—reducing costs by up to 60% compared to using multimodal models throughout." ] }, { "cell_type": "markdown", "id": "51980214-fa96-444b-ae9b-169c22c2b432", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "Before you begin, ensure you have a Llama API key from [Llama API](https://llama.developer.meta.com/).\n", "\n", "## Install dependencies\n", "\n", "You will need a few libraries for this project: `llama-api-client` for API access and `pillow` for image handling." ] }, { "cell_type": "code", "execution_count": null, "id": "05998f5a-2a16-422a-b340-54e4058717ef", "metadata": {}, "outputs": [], "source": [ "# Install dependencies\n", "!pip install --quiet llama-api-client pillow" ] }, { "cell_type": "markdown", "id": "eb85a399-4197-41b2-86c7-7b7920ba6b1c", "metadata": {}, "source": [ "## Imports & Llama API client setup\n", "\n", "Import the necessary modules and initialize the `LlamaAPIClient` using your API key as an environment variable.\n", "\n", "> **Note:** This tutorial uses Llama API, but you can adapt it to other inference providers such as [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) and [Together AI](https://together.ai/llama)." ] }, { "cell_type": "code", "execution_count": 1, "id": "17c3192f-f5b8-4eda-9a33-5f7a6b304fc2", "metadata": {}, "outputs": [], "source": [ "import os, sys, json\n", "import base64\n", "import textwrap\n", "from pathlib import Path\n", "from typing import Dict, List, Optional, Any\n", "from datetime import datetime\n", "from io import BytesIO\n", "from PIL import Image\n", "from pydantic import BaseModel, Field\n", "from llama_api_client import LlamaAPIClient\n", "\n", "# --- Llama client ---\n", "API_KEY = os.getenv(\"LLAMA_API_KEY\")\n", "if not API_KEY:\n", " sys.exit(\"❌ Please set the LLAMA_API_KEY environment variable.\")\n", "\n", "client = LlamaAPIClient(api_key=API_KEY)" ] }, { "cell_type": "markdown", "id": "5ef1c051-a1e4-4ca3-b093-bbefb75b9bb5", "metadata": {}, "source": [ "### Model configuration\n", "\n", "The tutorial uses two specialized Llama models, optimized for different tasks: Llama 4 Maverick for multimodal input processing (text + images), while Llama 4 Scout for fast text performance with tool calling capabilities." ] }, { "cell_type": "code", "execution_count": null, "id": "e783e397-6f60-4ac3-bb23-2906147047d0", "metadata": {}, "outputs": [], "source": [ "# --- Constants & Configuration ---\n", "STAGE1_MODEL = \"Llama-4-Maverick-17B-128E-Instruct-FP8\"\n", "STAGE2_MODEL = \"Llama-4-Scout-17B-16E-Instruct-FP8\"\n", "MAX_COMPLETION_TOKENS = 2000 # Max tokens for model completion\n", "\n", "# Setting a token limit is a best practice for controlling costs and ensuring \n", "# predictable performance. This value acts as a safeguard, preventing runaway \n", "# requests while being high enough to handle complex invoices." ] }, { "cell_type": "markdown", "id": "2abc55b6-9b18-4d41-809c-0d9ceb1b2ec8", "metadata": {}, "source": [ "## Load sample invoice data\n", "\n", "The tutorial uses a small subset of the **SROIE dataset** (Scanned Receipts OCR and Information Extraction), containing real-world receipts and invoices with diverse layouts.\n", "\n", "> The full [SROIE dataset](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2) contains thousands of invoices with varying quality—from pristine digital exports to faded scans." ] }, { "cell_type": "code", "execution_count": 3, "id": "c10059b1-bfe1-44de-a712-9fba20813f1d", "metadata": {}, "outputs": [], "source": [ "# Path to SROIE dataset samples\n", "DATA_DIR = Path(\"data\")\n", "INVOICE_IMG_DIR = DATA_DIR / \"invoice_img\"\n", "INVOICE_JSON_DIR = DATA_DIR / \"invoice_json\"" ] }, { "cell_type": "code", "execution_count": 6, "id": "26aaa90a-254d-41fc-9b49-8fa6e9bd4e54", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ 10 invoices loaded\n" ] } ], "source": [ "def load_sroie_data():\n", " \"\"\"Load SROIE dataset invoice images and ground truth.\"\"\"\n", " invoice_images = sorted(INVOICE_IMG_DIR.glob(\"*.jpg\"))\n", " \n", " invoices = []\n", " for img_path in invoice_images: # Load all invoice images\n", " json_path = INVOICE_JSON_DIR / f\"{img_path.stem}.txt\"\n", " \n", " ground_truth = {}\n", " if json_path.exists():\n", " with open(json_path, 'r') as f:\n", " ground_truth = json.loads(f.read())\n", " \n", " invoices.append({\n", " \"image_path\": str(img_path),\n", " \"filename\": img_path.name,\n", " \"ground_truth\": ground_truth\n", " })\n", " \n", " print(f\"✅ {len(invoices)} invoices loaded\")\n", " return invoices\n", "\n", "def load_invoice_image(image_path: str) -> str:\n", " \"\"\"Load and encode invoice image for API calls.\"\"\"\n", " with open(image_path, \"rb\") as img_file:\n", " return base64.b64encode(img_file.read()).decode('utf-8')\n", "\n", "# Load SROIE dataset\n", "sroie_invoices = load_sroie_data()" ] }, { "cell_type": "markdown", "id": "327f6548-abed-488f-a6a9-8fc33540ddc5", "metadata": {}, "source": [ "## Stage 1 - Visual extraction with multimodal model\n", "\n", "**Stage 1 uses Llama 4 Maverick's multimodal capabilities** for accurate text extraction from complex invoice layouts. The key is to be **faithful to the source**—capturing exactly what's visible, including any ambiguities." ] }, { "cell_type": "markdown", "id": "1b24f4ff-9ba7-43ea-9d87-fbafedd18d8e", "metadata": {}, "source": [ "### Define the invoice schema\n", "\n", "We'll use Pydantic models to define our expected output structure, ensuring consistent extraction across all invoices." ] }, { "cell_type": "code", "execution_count": 7, "id": "d4953a45-13d1-48ac-bbcf-49a99bd31504", "metadata": {}, "outputs": [], "source": [ "# --- Pydantic Models for Structured Output ---\n", "class RawInvoiceData(BaseModel):\n", " \"\"\"Model for Stage 1 raw extraction output - aligned with SROIE dataset.\"\"\"\n", " vendor_name: str = Field(description=\"Company that issued the invoice\")\n", " invoice_date: str = Field(\n", " description=\"Date invoice was issued (preserve original format)\")\n", " vendor_address: str = Field(description=\"Vendor address as shown on invoice\")\n", " total_amount: str = Field(\n", " description=\"Total amount as numeric string only (e.g., '123.45', no currency symbols)\")\n", " currency_symbol: str = Field(\n", " description=\"Currency symbol or code found (e.g., 'RM', '$', 'USD')\")\n", " extraction_notes: str = Field(\n", " description=\"Any visual ambiguities, unclear text, or alternative interpretations observed\")" ] }, { "cell_type": "markdown", "id": "0ff02edb-2e3d-4426-a60b-5cdbc060eac1", "metadata": {}, "source": [ "### Implement visual extraction\n", "\n", "**Prompt strategy**: Instructs the model to act as a high-fidelity transcriber, separating numeric amounts from currency symbols and documenting visual uncertainties in `extraction_notes` for Stage 2 resolution." ] }, { "cell_type": "code", "execution_count": null, "id": "84e84536-cfe4-424c-b27e-34b41f6c9698", "metadata": {}, "outputs": [], "source": [ "def stage1_visual_extraction(image_path: str) -> Dict:\n", " \"\"\"\n", " Stage 1: Extract raw data from invoice image using multimodal model.\n", " Focus: accurate transcription without interpretation.\n", " \"\"\"\n", " \n", " system_prompt = \"\"\"Extract invoice information from the image with perfect accuracy.\n", "\n", "EXTRACTION RULES:\n", "1. Extract ONLY what you clearly see - use \"UNCLEAR\" or \"NOT_FOUND\" if uncertain\n", "2. Preserve original date formats (e.g., \"15/01/2019\")\n", "3. total_amount: numeric value only (e.g., \"123.45\") - NO symbols or formatting\n", "4. currency_symbol: symbol/code only (e.g., \"RM\", \"$\", \"USD\")\n", "5. extraction_notes: document visual uncertainties and alternative interpretations\n", "\n", "EXAMPLES of extraction_notes:\n", "- \"Currency symbol appears to be 'RM' but could be 'SR' due to print quality\"\n", "- \"Date format ambiguous - could be DD/MM or MM/DD\"\n", "- \"Vendor name partially obscured\"\n", "\n", "Output using the provided JSON schema.\"\"\"\n", "\n", " # Load and encode image\n", " base64_image = load_invoice_image(image_path)\n", " \n", " user_content = [\n", " {\n", " \"type\": \"text\",\n", " \"text\": \"Extract all information from this invoice image.\"\n", " },\n", " {\n", " \"type\": \"image_url\",\n", " \"image_url\": {\"url\": f\"data:image/jpeg;base64,{base64_image}\"}\n", " }\n", " ]\n", " \n", " response_format = {\n", " \"type\": \"json_schema\",\n", " \"json_schema\": {\n", " \"name\": RawInvoiceData.__name__,\n", " \"schema\": RawInvoiceData.model_json_schema(),\n", " },\n", " }\n", " \n", " try:\n", " response = client.chat.completions.create(\n", " model=STAGE1_MODEL,\n", " messages=[\n", " {\"role\": \"system\", \"content\": system_prompt},\n", " {\"role\": \"user\", \"content\": user_content}\n", " ],\n", " temperature=0.1,\n", " max_completion_tokens=MAX_COMPLETION_TOKENS,\n", " response_format=response_format\n", " )\n", " result = json.loads(response.completion_message.content.text)\n", " \n", " return {'success': True, 'raw_data': result}\n", " \n", " except Exception as e:\n", " return {'success': False, 'error': str(e)}" ] }, { "cell_type": "markdown", "id": "06b99d96-bccb-4807-a0ad-732ee885e84f", "metadata": {}, "source": [ "Now let's process all SROIE invoices through Stage 1:" ] }, { "cell_type": "code", "execution_count": null, "id": "f7480ee9-3f69-4071-ae1e-3123cd6ec175", "metadata": {}, "outputs": [], "source": [ "from typing import Any, Dict, Optional\n", "\n", "def normalize_company_name(name: Optional[str]) -> str:\n", " \"\"\"Normalize company name for consistent comparison.\"\"\"\n", " return (name or \"\").upper().strip()\n", "\n", "def normalize_amount(amount: Any) -> str:\n", " \"\"\"Normalize amount for consistent comparison by removing symbols and whitespace.\"\"\"\n", " amount_str = str(amount or \"\")\n", " return amount_str.replace('$', '').replace('RM', '').strip()\n", "\n", "def evaluate_extraction(data: Dict, gt: Dict, amount_field: str = 'total_amount') -> tuple[bool, bool]:\n", " \"\"\"Shared evaluation logic for both stages using normalization helpers.\"\"\"\n", " company_norm = normalize_company_name(data.get('vendor_name'))\n", " gt_company_norm = normalize_company_name(gt.get('company'))\n", " \n", " amount_str = normalize_amount(data.get(amount_field))\n", " gt_amount_str = normalize_amount(gt.get('total'))\n", " \n", " return company_norm == gt_company_norm, amount_str == gt_amount_str\n", "\n", "def print_comparison(i, filename, data, gt, company_match, total_match, \n", " stage_name, amount_field='total_amount', extra_info=None):\n", " \"\"\"Shared printing logic for both stages.\"\"\"\n", " print(f\"[{i}] {filename}\")\n", " print(f\"{stage_name}: {data.get('vendor_name')} | {data.get(amount_field)}\")\n", " print(f\"Target: {gt.get('company')} | {gt.get('total')}\")\n", " overall_match = company_match and total_match\n", " print(f\"{'✅' if overall_match else '❌'} {'✓' if overall_match else '✗'} \"\n", " f\"(Company:{'✓' if company_match else '✗'} | Amount:{'✓' if total_match else '✗'})\")\n", " if extra_info: print(extra_info)\n", " print()" ] }, { "cell_type": "code", "execution_count": null, "id": "92e049c1-2089-4428-b80a-2df42f36ae7a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔍 Processing invoices through Stage 1...\n", "[1] X00016469670.jpg\n", "Extracted: OJC MARKETING SDN BHD | 193.00\n", "Target OJC MARKETING SDN BHD | 193.00\n", "📝 Notes: The currency symbol 'SR' is used, which typically represents Saudi Riyal. The invoice is clearly marked as a 'TAX INVOICE' and includes details such as invoice number, date, cashier, sales person, and bill to information. The product details and total amount are also clearly listed.\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "\n", "[2] X00016469671.jpg\n", "Extracted: OJC MARKETING SDN BHD | 170.00\n", "Target OJC MARKETING SDN BHD | 170.00\n", "📝 Notes: The currency symbol is not explicitly shown on the invoice, but the amounts are listed with two decimal places, suggesting a currency that uses this format, such as MYR (Malaysian Ringgit). The vendor is based in Malaysia, supporting this interpretation.\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "\n", "[3] X51005200931.jpg\n", "Extracted: PERNIAGAAN ZHENG HUI | 436.20\n", "Target PERNIAGAAN ZHENG HUI | 436.20\n", "📝 Notes: The invoice is clear and legible, with all necessary information visible. The date format is DD/MM/YYYY.\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "\n", "[4] X51005230605.jpg\n", "Extracted: PETRON BKT LANJAN SB | : 4.90\n", "Target PETRON BKT LANJAN SB | 4.90\n", "📝 Notes: The receipt appears to be from a Petron gas station, and it includes a purchase of food items and GST. The total amount is clearly stated as RM 4.90. The date is in the format DD/MM/YYYY.\n", "❌ ✗ (Company:✓ | Amount:✗)\n", "\n", "[5] X51005230616.jpg\n", "Extracted: Gerbang Alaf Restaurants Sdn Bhd (formerly known as Golden Arches Restaurants Sdn Bhd) | 38.90\n", "Target GERBANG ALAF RESTAURANTS SDN BHD | 38.90\n", "📝 Notes: The currency symbol is assumed to be 'RM' as it is the local currency in Malaysia where the invoice is from, but it is not explicitly shown on the invoice.\n", "❌ ✗ (Company:✗ | Amount:✓)\n", "\n", "[6] X51005230621.jpg\n", "Extracted: SIN LIANHAP SDN BHD | .$30\n", "Target SIN LIANHAP SDN BHD | 7.30\n", "📝 Notes: The total amount is listed as '7.30' under 'Payment', and the currency symbol is 'RM' as indicated next to the item prices.\n", "❌ ✗ (Company:✓ | Amount:✗)\n", "\n", "[7] X51005230648.jpg\n", "Extracted: CROSS CHANNEL NETWORK SDN. BHD. | 6.35\n", "Target CROSS CHANNEL NETWORK SDN. BHD. | 6.35\n", "📝 Notes: The invoice number is BTG-052332. The product purchased is 'SCHNEIDER E15R 13A SWITCH SOCKET OUTLET' with a quantity of 1. The total amount includes GST at 6%. The paid amount was RM 10.00, and the change given was RM 3.65. The GST summary shows SR @ A with an amount of RM 6.00 and tax of RM 0.36.\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "\n", "[8] X51005230657.jpg\n", "Extracted: CROSS CHANNEL NETWORK SDN. BHD. | 10.00\n", "Target CROSS CHANNEL NETWORK SDN. BHD. | 7.95\n", "📝 Notes: The invoice is clear and legible. The date is in the format DD/MM/YYYY and includes a timestamp. The total amount is clearly stated as 'Total Amt Payable: 10.00'. The currency symbol 'RM' is used consistently throughout the invoice.\n", "❌ ✗ (Company:✓ | Amount:✗)\n", "\n", "[9] X51005230659.jpg\n", "Extracted: SWC ENTERPRISE SDN BHD | $patchy image obscuring total amount\n", "Target SWC ENTERPRISE SDN BHD | 8.00\n", "📝 Notes: The total amount is partially obscured by a patchy image, making it difficult to determine the exact value. The visible amount is '8.00', but it's unclear if this is the total or a subtotal. The currency symbol is not explicitly shown on the invoice.\n", "❌ ✗ (Company:✓ | Amount:✗)\n", "\n", "[10] X51005268275.jpg\n", "Extracted: LIGHTROOM GALLERY SDN BHD | 278.80\n", "Target LIGHTROOM GALLERY SDN BHD | 278.80\n", "📝 Notes: The image is a clear receipt from Lightroom Gallery Sdn Bhd, dated 20/11/2017. The total amount is RM 278.80. The receipt includes details of items purchased, GST, and payment information.\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "\n", "✅ Stage 1: 10/10 processed | Accuracy: 90.0% company, 60.0% amount\n" ] } ], "source": [ "print(\"🔍 Processing invoices through Stage 1...\")\n", "\n", "stage1_results = []\n", "for i, invoice in enumerate(sroie_invoices, 1):\n", " result = stage1_visual_extraction(invoice['image_path'])\n", " \n", " if result['success']:\n", " raw_data, gt = result['raw_data'], invoice['ground_truth']\n", " \n", " # Normalize for comparison\n", " company_match, total_match = evaluate_extraction(raw_data, gt)\n", " \n", " print(f\"[{i}] {invoice['filename']}\")\n", " print(f\"Extracted: {raw_data.get('vendor_name')} | {raw_data.get('total_amount')}\")\n", " print(f\"Target {gt.get('company')} | {gt.get('total')}\")\n", " if raw_data.get('extraction_notes') and raw_data['extraction_notes'].strip():\n", " print(f\"📝 Notes: {raw_data['extraction_notes']}\")\n", " overall_match = company_match and total_match\n", " print(f\"{'✅' if overall_match else '❌'} {'✓' if overall_match else '✗'} \"\n", " f\"(Company:{'✓' if company_match else '✗'} | \"\n", " f\"Amount:{'✓' if total_match else '✗'})\\n\")\n", " \n", " stage1_results.append({\n", " 'invoice': invoice, 'result': result,\n", " 'company_match': company_match, 'total_match': total_match\n", " })\n", " else:\n", " print(f\"[{i}] {invoice['filename']} ❌ FAILED\\n\")\n", " stage1_results.append({\n", " 'invoice': invoice, 'result': result,\n", " 'company_match': False, 'total_match': False\n", " })\n", "\n", "successful = [r for r in stage1_results if r['result']['success']]\n", "company_accuracy = (sum(1 for r in successful if r['company_match']) / \n", " len(successful) if successful else 0)\n", "total_accuracy = (sum(1 for r in successful if r['total_match']) / \n", " len(successful) if successful else 0)\n", "\n", "print(f\"✅ Stage 1: {len(successful)}/{len(sroie_invoices)} processed | \"\n", " f\"Accuracy: {company_accuracy:.1%} company, {total_accuracy:.1%} amount\")" ] }, { "cell_type": "markdown", "id": "9a877341-31bb-4d6e-b416-0f386c3f0086", "metadata": {}, "source": [ "## Stage 2 - Intelligent refinement with tool calling\n", "\n", "**Stage 2 uses Llama 4 Scout** for fast performance and tool calling, applying business logic to resolve ambiguities and enrich data with external services. This is where the system becomes truly intelligent by resolving ambiguities and enriching data with external information." ] }, { "cell_type": "markdown", "id": "be059571-e9df-4a38-8ecc-da02aae6557c", "metadata": {}, "source": [ "### Define tools for external services\n", "\n", "We'll create tools that the model can use to enrich the extracted data. These tools enable the system to perform currency conversion and data enrichment.\n", "\n", "> **Learn more about tool calling**: For comprehensive guidance on implementing tool calling with Llama models, see the [Meta Tool Calling Guide](https://llama.developer.meta.com/docs/features/tool-calling/).\n", "\n", "**Tool strategy**: In this tutorial we use currency conversion to demonstrate tool calling with clear purpose and structured data handling. This same pattern extends to other tools such as vendor validation, tax calculation, compliance checks, and other business logic integrations.\n", "\n", "**Implementation note**: The currency conversion implementation below uses static exchange rates for tutorial simplicity. In production, you would integrate with live currency APIs such as ExchangeRate-API, Fixer.io, or your financial system's currency service." ] }, { "cell_type": "code", "execution_count": 12, "id": "ab02441b-3c06-48e7-992a-38d5dc3c53df", "metadata": {}, "outputs": [], "source": [ "# --- Tool Definitions ---\n", "def get_currency_conversion_tool():\n", " \"\"\"Define the currency conversion tool for Stage 2.\"\"\"\n", " return {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"convert_currency\",\n", " \"description\": \"Convert amount from one currency to USD using static exchange rates\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"amount\": {\n", " \"type\": \"number\",\n", " \"description\": \"The amount to convert\"\n", " },\n", " \"from_currency\": {\n", " \"type\": \"string\",\n", " \"description\": \"Source currency code (MYR, EUR, GBP, SGD)\"\n", " },\n", " \"to_currency\": {\n", " \"type\": \"string\",\n", " \"description\": \"Target currency code (USD only)\"\n", " }\n", " },\n", " \"required\": [\"amount\", \"from_currency\", \"to_currency\"]\n", " }\n", " }\n", " }" ] }, { "cell_type": "code", "execution_count": 13, "id": "d5cceab3-116a-495a-a402-c972c83e081e", "metadata": {}, "outputs": [], "source": [ "def convert_currency(amount: float, from_currency: str, to_currency: str = \"USD\") -> Dict:\n", " \"\"\"Convert currency amounts using static exchange rates (tutorial implementation).\"\"\"\n", " \n", " # Static rates for tutorial (August 2024)\n", " rates = {\"MYR\": 0.21, \"EUR\": 1.09, \"GBP\": 1.27, \"SGD\": 0.74}\n", " \n", " # Validate inputs\n", " if not isinstance(amount, (int, float)) or amount < 0:\n", " return {\"error\": \"Amount must be a non-negative number\"}\n", " \n", " if from_currency == to_currency:\n", " return {\n", " \"converted_amount\": float(amount), \n", " \"exchange_rate\": 1.0, \n", " \"note\": \"No conversion needed\"\n", " }\n", " \n", " if from_currency in rates and to_currency == \"USD\":\n", " converted = amount * rates[from_currency]\n", " return {\n", " \"converted_amount\": round(converted, 2),\n", " \"exchange_rate\": rates[from_currency],\n", " \"source_currency\": from_currency,\n", " \"target_currency\": to_currency\n", " }\n", " \n", " return {\"error\": f\"Currency conversion from {from_currency} to {to_currency} not supported\"}" ] }, { "cell_type": "code", "execution_count": 14, "id": "9b3d5c65-fab5-41d8-ad2f-18f5b79a9526", "metadata": {}, "outputs": [], "source": [ "def execute_tool(tool_name: str, arguments: Dict) -> Dict:\n", " \"\"\"Execute the requested tool and return results.\"\"\"\n", " \n", " if tool_name == \"convert_currency\":\n", " try:\n", " return convert_currency(\n", " amount=arguments[\"amount\"],\n", " from_currency=arguments[\"from_currency\"], \n", " to_currency=arguments[\"to_currency\"]\n", " )\n", " except KeyError as e:\n", " return {\"error\": f\"Missing required argument: {e}\"}\n", " except (TypeError, ValueError) as e:\n", " return {\"error\": f\"Invalid argument type: {e}\"}\n", " \n", " return {\"error\": f\"Unknown tool: {tool_name}\"}" ] }, { "cell_type": "markdown", "id": "6e626bcf-af9f-4900-a4da-f796a20a89b2", "metadata": {}, "source": [ "### Define enriched output schema\n", "\n", "The enriched schema structures Stage 1's raw extraction into clean, business-ready data with currency conversion and processing transparency." ] }, { "cell_type": "code", "execution_count": 15, "id": "d76d2cfe-cc26-4953-baa6-d139f1ec9937", "metadata": {}, "outputs": [], "source": [ "class EnrichedInvoiceData(BaseModel):\n", " \"\"\"Model for Stage 2 enriched output with currency conversion.\"\"\"\n", " vendor_name: str = Field(description=\"Vendor company name\")\n", " vendor_address: str = Field(description=\"Vendor address\")\n", " invoice_date: str = Field(\n", " description=\"Invoice date in ISO format (YYYY-MM-DD)\")\n", " original_amount: str = Field(\n", " description=\"Original amount as numeric string (e.g., '123.45', no symbols)\")\n", " original_currency: str = Field(\n", " description=\"Original currency symbol (e.g., 'MYR', 'USD')\")\n", " converted_amount_usd: str = Field(\n", " description=\"USD amount as numeric string (e.g., '25.89', no symbols)\")\n", " exchange_rate: str = Field(\n", " description=\"Exchange rate as numeric string (e.g., '0.21')\")\n", " reasoning_notes: str = Field(description=\"AI reasoning summary\")" ] }, { "cell_type": "markdown", "id": "9aecd4ee-1e6f-49b9-b4bc-75fe9b91595d", "metadata": {}, "source": [ "### Implement intelligent refinement\n", "\n", "**Processing strategy**: We use Llama 4 Scout with structured JSON output and tool calling to enrich the raw data from Stage 1, resolve currency ambiguities, standardize dates, and convert amounts to USD. The model analyzes `extraction_notes` to resolve documented ambiguities using business context, standardizes dates, follows strict numeric formatting rules, and converts amounts to USD." ] }, { "cell_type": "code", "execution_count": null, "id": "1370cffe-24f2-423c-b0bf-eb83527dcf59", "metadata": {}, "outputs": [], "source": [ "def stage2_intelligent_refinement(raw_data: Dict) -> Dict:\n", " \"\"\"Stage 2: Refine and enrich raw extracted data using tool calling.\"\"\"\n", " \n", " # Validate input\n", " if not isinstance(raw_data, dict):\n", " return {'success': False, 'error': 'Invalid input: raw_data must be a dictionary', 'stage': 2}\n", " \n", " required_fields = ['vendor_name', 'total_amount', 'currency_symbol']\n", " missing_fields = [field for field in required_fields if field not in raw_data]\n", " if missing_fields:\n", " return {'success': False, 'error': f'Missing required fields: {missing_fields}', 'stage': 2}\n", " \n", " try:\n", " tools = [get_currency_conversion_tool()]\n", " response_format = {\n", " \"type\": \"json_schema\",\n", " \"json_schema\": {\n", " \"name\": EnrichedInvoiceData.__name__,\n", " \"schema\": EnrichedInvoiceData.model_json_schema(),\n", " },\n", " }\n", " \n", " refinement_prompt = f\"\"\"Analyze the raw invoice data below and enrich it with currency conversion.\n", "\n", "Raw data from Stage 1:\n", "{json.dumps(raw_data, indent=2)}\n", "\n", "TASKS:\n", "1. Extract original_amount from total_amount (numeric string only)\n", "2. Resolve currency ambiguities using extraction_notes + context clues\n", "3. Standardize date to YYYY-MM-DD format\n", "4. Use convert_currency tool for non-USD amounts\n", "5. Document reasoning in reasoning_notes\n", "\n", "AMBIGUITY RESOLUTION:\n", "- Use extraction_notes to identify uncertainties from Stage 1\n", "- Attempt to resolve uncertainties based on the information you have\n", "- Apply context clues: Malaysian addresses suggest MYR currency\n", "- Reference extraction_notes findings in your reasoning_notes\n", "\n", "OUTPUT FORMATTING (numeric strings only):\n", "- original_amount: \"123.45\" (no symbols)\n", "- converted_amount_usd: \"25.89\" (no symbols)\n", "- exchange_rate: \"0.21\" (no symbols)\n", "- original_currency: \"MYR\" (code only, not \"RM\")\n", "\"\"\"\n", "\n", " messages = [{\"role\": \"user\", \"content\": refinement_prompt}]\n", " \n", " response = client.chat.completions.create(\n", " model=STAGE2_MODEL,\n", " messages=messages,\n", " tools=tools,\n", " temperature=0.3,\n", " max_completion_tokens=MAX_COMPLETION_TOKENS,\n", " response_format=response_format\n", " )\n", " \n", " # Handle tool calls if the model wants to use them\n", " if response.completion_message.tool_calls:\n", " # Add assistant's response to conversation\n", " messages.append({\n", " \"role\": \"assistant\",\n", " \"tool_calls\": [{\n", " \"id\": call.id,\n", " \"function\": {\n", " \"name\": call.function.name,\n", " \"arguments\": call.function.arguments\n", " }\n", " } for call in response.completion_message.tool_calls]\n", " })\n", " \n", " # Execute tools and add results\n", " for tool_call in response.completion_message.tool_calls:\n", " try:\n", " arguments = json.loads(tool_call.function.arguments)\n", " tool_result = execute_tool(tool_call.function.name, arguments)\n", " \n", " messages.append({\n", " \"role\": \"tool\",\n", " \"tool_call_id\": tool_call.id,\n", " \"content\": json.dumps(tool_result)\n", " })\n", " except json.JSONDecodeError:\n", " return {'success': False, 'error': 'Invalid tool arguments JSON', 'stage': 2}\n", " \n", " # Get final structured response\n", " messages.append({\"role\": \"user\", \"content\": \n", " \"Provide the complete enriched invoice data following the required schema. \"\n", " \"Include conversion details in converted_amount_usd and exchange_rate fields.\"})\n", " \n", " final_response = client.chat.completions.create(\n", " model=STAGE2_MODEL,\n", " messages=messages,\n", " temperature=0.1,\n", " max_completion_tokens=MAX_COMPLETION_TOKENS,\n", " response_format=response_format\n", " )\n", " \n", " try:\n", " enriched_data = json.loads(final_response.completion_message.content.text)\n", " except json.JSONDecodeError:\n", " return {'success': False, 'error': 'Invalid JSON response from final call', 'stage': 2}\n", " else:\n", " # No tools needed, parse direct response\n", " try:\n", " enriched_data = json.loads(response.completion_message.content.text)\n", " except json.JSONDecodeError:\n", " return {'success': False, 'error': 'Invalid JSON response from initial call', 'stage': 2}\n", " \n", " # Validate required fields in enriched data\n", " required_enriched_fields = ['vendor_name', 'original_amount']\n", " missing_enriched = [field for field in required_enriched_fields if field not in enriched_data]\n", " if missing_enriched:\n", " return {'success': False, 'error': f'Missing enriched fields: {missing_enriched}', 'stage': 2}\n", " \n", " return {\n", " 'success': True,\n", " 'stage': 2,\n", " 'enriched_data': enriched_data\n", " }\n", " \n", " except Exception as e:\n", " return {\n", " 'success': False,\n", " 'error': str(e),\n", " 'stage': 2\n", " }" ] }, { "cell_type": "markdown", "id": "8ab483f4-f7af-4d97-af6b-5cee107499eb", "metadata": {}, "source": [ "Now let's process the successful Stage 1 results through Stage 2:" ] }, { "cell_type": "code", "execution_count": 17, "id": "2730a07c-3572-4042-a5b3-aee8891f46b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🧠 Processing invoices through Stage 2 (intelligent refinement)...\n", "[1] X00016469670.jpg\n", "Enriched: OJC MARKETING SDN BHD | 193.00\n", "Target: OJC MARKETING SDN BHD | 193.00\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $46.09\n", "\n", "[2] X00016469671.jpg\n", "Enriched: OJC MARKETING SDN BHD | 170.00\n", "Target: OJC MARKETING SDN BHD | 170.00\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $40.09\n", "\n", "[3] X51005200931.jpg\n", "Enriched: PERNIAGAAN ZHENG HUI | 436.20\n", "Target: PERNIAGAAN ZHENG HUI | 436.20\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $97.53\n", "\n", "[4] X51005230605.jpg\n", "Enriched: PETRON BKT LANJAN SB | 4.90\n", "Target: PETRON BKT LANJAN SB | 4.90\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $1.16\n", "\n", "[5] X51005230616.jpg\n", "Enriched: Gerbang Alaf Restaurants Sdn Bhd (formerly known as Golden Arches Restaurants Sdn Bhd) | 38.90\n", "Target: GERBANG ALAF RESTAURANTS SDN BHD | 38.90\n", "❌ ✗ (Company:✗ | Amount:✓)\n", "💱 Converted: MYR → USD $,{\n", "\n", "[6] X51005230621.jpg\n", "Enriched: SIN LIANHAP SDN BHD | 7.30\n", "Target: SIN LIANHAP SDN BHD | 7.30\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $1.75\n", "\n", "[7] X51005230648.jpg\n", "Enriched: CROSS CHANNEL NETWORK SDN. BHD. | 6.35\n", "Target: CROSS CHANNEL NETWORK SDN. BHD. | 6.35\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $[convert_currency(amount=6.35, from_currency='MYR', to_currency='USD')]\n", "\n", "[8] X51005230657.jpg\n", "Enriched: CROSS CHANNEL NETWORK SDN. BHD. | 10.00\n", "Target: CROSS CHANNEL NETWORK SDN. BHD. | 7.95\n", "❌ ✗ (Company:✓ | Amount:✗)\n", "💱 Converted: MYR → USD $2.40\n", "\n", "[9] X51005230659.jpg\n", "Enriched: SWC ENTERPRISE SDN BHD | 8.00\n", "Target: SWC ENTERPRISE SDN BHD | 8.00\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $1.92\n", "\n", "[10] X51005268275.jpg\n", "Enriched: LIGHTROOM GALLERY SDN BHD | 278.80\n", "Target: LIGHTROOM GALLERY SDN BHD | 278.80\n", "✅ ✓ (Company:✓ | Amount:✓)\n", "💱 Converted: MYR → USD $62.49\n", "\n", "✅ Stage 2: 10/10 enriched | Accuracy: 90.0% company, 90.0% amount\n", " Currency conversions: 10 invoices\n" ] } ], "source": [ "print(\"🧠 Processing invoices through Stage 2 (intelligent refinement)...\")\n", "\n", "stage2_results = []\n", "successful_stage1 = [r for r in stage1_results if r['result']['success']]\n", "\n", "for i, stage1_result in enumerate(successful_stage1, 1):\n", " invoice = stage1_result['invoice']\n", " result = stage2_intelligent_refinement(stage1_result['result']['raw_data'])\n", " \n", " if result['success']:\n", " enriched, gt = result['enriched_data'], invoice['ground_truth']\n", " company_match, total_match = evaluate_extraction(enriched, gt, 'original_amount')\n", " \n", " # Currency conversion info\n", " extra_info = None\n", " if enriched.get('converted_amount_usd') and enriched.get('original_currency') not in ['USD', '$']:\n", " curr = enriched.get('original_currency', 'Unknown')\n", " usd = enriched.get('converted_amount_usd', 'N/A')\n", " extra_info = f\"💱 Converted: {curr} → USD ${usd}\"\n", " \n", " print_comparison(i, invoice['filename'], enriched, gt, company_match, total_match, \"Enriched\", 'original_amount', extra_info)\n", " \n", " stage2_results.append({\n", " 'invoice': invoice, 'stage1_result': stage1_result, 'stage2_result': result,\n", " 'company_match': company_match, 'total_match': total_match\n", " })\n", " else:\n", " print(f\"[{i}] {invoice['filename']} ❌ FAILED: {result.get('error', 'Unknown error')}\\n\")\n", " stage2_results.append({\n", " 'invoice': invoice, 'stage1_result': stage1_result, 'stage2_result': result,\n", " 'company_match': False, 'total_match': False\n", " })\n", "\n", "successful_stage2 = [r for r in stage2_results if r['stage2_result']['success']]\n", "company_accuracy = sum(1 for r in successful_stage2 if r['company_match']) / len(successful_stage2) if successful_stage2 else 0\n", "total_accuracy = sum(1 for r in successful_stage2 if r['total_match']) / len(successful_stage2) if successful_stage2 else 0\n", "conversions = [r for r in successful_stage2 if r['stage2_result']['enriched_data'].get('converted_amount_usd')]\n", "\n", "print(f\"✅ Stage 2: {len(successful_stage2)}/{len(successful_stage1)} enriched | \"\n", " f\"Accuracy: {company_accuracy:.1%} company, {total_accuracy:.1%} amount\")\n", "print(f\" Currency conversions: {len(conversions)} invoices\")" ] }, { "cell_type": "markdown", "id": "e6ba8960-da57-4d1c-bfaa-c4147755e834", "metadata": {}, "source": [ "Let's examine the final structured outputs from our two-stage pipeline:" ] }, { "cell_type": "code", "execution_count": 18, "id": "a28c3244-2e7a-4efb-9cc2-6e53933323c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📋 Final Structured Outputs from Two-Stage Pipeline:\n", "\n", "[1] X00016469670.jpg:\n", "{\n", " \"vendor_name\": \"OJC MARKETING SDN BHD\",\n", " \"vendor_address\": \"NO 2 & 4, JALAN BAYU 4, BANDAR SERI ALAM, 81750 MASAI, JOHOR\",\n", " \"invoice_date\": \"2019-01-15\",\n", " \"original_amount\": \"193.00\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"46.09\",\n", " \"exchange_rate\": \"0.2387\",\n", " \"reasoning_notes\": \"The currency symbol 'SR' was initially provided, but based on the vendor address in Malaysia and the extraction notes, it seems there was a confusion. The address suggests the currency is likely MYR. The amount 193.00 was converted from MYR to USD using the exchange rate 0.2387, resulting in 46.09 USD.\"\n", "}\n", "--------------------------------------------------\n", "[2] X00016469671.jpg:\n", "{\n", " \"vendor_name\": \"OJC MARKETING SDN BHD\",\n", " \"vendor_address\": \"NO 2 & 4, JALAN BAYU 4, BANDAR SERI ALAM, 81750 MASAI, JOHOR\",\n", " \"invoice_date\": \"2019-02-01\",\n", " \"original_amount\": \"170.00\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"40.09\",\n", " \"exchange_rate\": \"0.2357\",\n", " \"reasoning_notes\": \"The currency symbol was not explicitly shown, but the vendor is based in Malaysia, and the amounts have two decimal places, suggesting MYR. The exchange rate used for conversion is based on static rates.\"\n", "}\n", "--------------------------------------------------\n", "[3] X51005200931.jpg:\n", "{\n", " \"vendor_name\": \"PERNIAGAAN ZHENG HUI\",\n", " \"vendor_address\": \"NO.59 JALAN PERMAS 9/5 BANDAR BARU PERMAS JAYA 81750 JOHOR BAHRU\",\n", " \"invoice_date\": \"2018-02-09\",\n", " \"original_amount\": \"436.20\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"97.53\",\n", " \"exchange_rate\": \"0.2236\",\n", " \"reasoning_notes\": \"The invoice contains a Malaysian address, suggesting the currency is MYR. The extraction notes mention that the invoice is clear and legible. The currency symbol 'RM' is commonly used in Malaysia to represent MYR. Therefore, it is reasonable to assume that the original currency is MYR. The convert_currency tool was used to convert the amount from MYR to USD.\"\n", "}\n", "--------------------------------------------------\n", "[4] X51005230605.jpg:\n", "{\n", " \"vendor_name\": \"PETRON BKT LANJAN SB\",\n", " \"vendor_address\": \"KM 458.4 BKT LANJAN UTARA, L/RAYA UTARA SELATAN,SG BULOH 47000 SUNGAI BULOH\",\n", " \"invoice_date\": \"2018-02-01\",\n", " \"original_amount\": \"4.90\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"1.16\",\n", " \"exchange_rate\": \"0.237\",\n", " \"reasoning_notes\": \"The extraction_notes mention that the receipt appears to be from a Petron gas station in Malaysia, and the total amount is clearly stated as RM 4.90. The vendor_address also suggests a Malaysian location, which implies the currency is MYR. The convert_currency tool was used to convert MYR 4.90 to USD.\"\n", "}\n", "--------------------------------------------------\n", "[5] X51005230616.jpg:\n", "{\n", " \"vendor_name\": \"Gerbang Alaf Restaurants Sdn Bhd (formerly known as Golden Arches Restaurants Sdn Bhd)\",\n", " \"vendor_address\": \"Level 6, Bangunan TH, Damansara Uptown3 No.3, Jalan SS21/39, 47400 Petaling Jaya Selangor\",\n", " \"invoice_date\": \"2018-01-18\",\n", " \"original_amount\": \"38.90\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \",{\",\n", " \"exchange_rate\": \"\",\n", " \"reasoning_notes\": \"\"\n", "}\n", "--------------------------------------------------\n", "[6] X51005230621.jpg:\n", "{\n", " \"vendor_name\": \"SIN LIANHAP SDN BHD\",\n", " \"vendor_address\": \"LOT 13, JALAN IPOH, KG BATU 30, ULU YAM LAMA 44300 BTG KALI, SELANGOR\",\n", " \"invoice_date\": \"2018-05-02\",\n", " \"original_amount\": \"7.30\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"1.75\",\n", " \"exchange_rate\": \"0.24\",\n", " \"reasoning_notes\": \"The vendor address is in Malaysia, suggesting MYR currency. The extraction notes mention 'RM' next to item prices, which is the currency symbol for Malaysian Ringgit. The total amount is listed as '7.30' under 'Payment'. Using convert_currency tool to convert MYR to USD.\"\n", "}\n", "--------------------------------------------------\n", "[7] X51005230648.jpg:\n", "{\n", " \"vendor_name\": \"CROSS CHANNEL NETWORK SDN. BHD.\",\n", " \"vendor_address\": \"47, JALAN MERANTI 1, SEK. 3, BANDAR UTAMA BATANG KALI, 44300 BATANG KALI, SELANGOR\",\n", " \"invoice_date\": \"2018-01-29\",\n", " \"original_amount\": \"6.35\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"[convert_currency(amount=6.35, from_currency='MYR', to_currency='USD')]\",\n", " \"exchange_rate\": \"[convert_currency(amount=1, from_currency='MYR', to_currency='USD')]\",\n", " \"reasoning_notes\": \"The vendor address is in Malaysia, suggesting MYR currency. The currency symbol 'RM' is consistent with MYR. The extraction notes confirm the total amount includes GST at 6%, and the paid amount and change given are in RM, further supporting MYR as the original currency.\"\n", "}\n", "--------------------------------------------------\n", "[8] X51005230657.jpg:\n", "{\n", " \"vendor_name\": \"CROSS CHANNEL NETWORK SDN. BHD.\",\n", " \"vendor_address\": \"47, JALAN MERANTI 1, SEK. 3, BANDAR UTAMA BATANG KALI, 44300 BATANG KALI, SELANGOR\",\n", " \"invoice_date\": \"2017-12-31\",\n", " \"original_amount\": \"10.00\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"2.40\",\n", " \"exchange_rate\": \"0.24\",\n", " \"reasoning_notes\": \"The vendor address suggests a Malaysian origin, and the currency symbol 'RM' is commonly used in Malaysia, which corresponds to MYR. The extraction notes confirm the currency symbol 'RM' is used consistently throughout the invoice. Therefore, the original currency is MYR. The amount '10.00' is converted to USD using the convert_currency tool.\"\n", "}\n", "--------------------------------------------------\n", "[9] X51005230659.jpg:\n", "{\n", " \"vendor_name\": \"SWC ENTERPRISE SDN BHD\",\n", " \"vendor_address\": \"NO. 5-7, Jalan Mahagoni 7/1, Sekysen 4, Bandar Utama, 44300 Batang Kali, Selangor.\",\n", " \"invoice_date\": \"2018-01-08\",\n", " \"original_amount\": \"8.00\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"1.92\",\n", " \"exchange_rate\": \"0.24\",\n", " \"reasoning_notes\": \"The vendor address suggests a Malaysian origin, which implies the currency might be MYR. Given the partial obscuration of the total amount and the visible '8.00', it is reasonable to assume this is the total in MYR. The exchange rate used for conversion is based on static rates.\"\n", "}\n", "--------------------------------------------------\n", "[10] X51005268275.jpg:\n", "{\n", " \"vendor_name\": \"LIGHTROOM GALLERY SDN BHD\",\n", " \"vendor_address\": \"No: 28, JALAN ASTANA 1C, BANDAR BUKIT RAJA, 41050 KLANG SELANGOR D.E, MALAYSIA\",\n", " \"invoice_date\": \"2017-11-20\",\n", " \"original_amount\": \"278.80\",\n", " \"original_currency\": \"MYR\",\n", " \"converted_amount_usd\": \"62.49\",\n", " \"exchange_rate\": \"0.224\",\n", " \"reasoning_notes\": \"The extraction_notes indicate that the receipt is from Lightroom Gallery Sdn Bhd, dated 20/11/2017, and the total amount is RM 278.80. The vendor_address suggests a Malaysian location, which implies the currency is MYR. The currency_symbol 'RM' is commonly used for Malaysian Ringgit. Therefore, the original_currency is MYR. Using the convert_currency tool, we can convert the amount to USD.\"\n", "}\n", "--------------------------------------------------\n" ] } ], "source": [ "print(\"📋 Final Structured Outputs from Two-Stage Pipeline:\\n\")\n", "\n", "for i, result in enumerate(successful_stage2, 1):\n", " invoice = result['invoice']\n", " enriched_data = result['stage2_result']['enriched_data']\n", " \n", " print(f\"[{i}] {invoice['filename']}:\")\n", " print(json.dumps(enriched_data, indent=2))\n", " print(\"-\" * 50)" ] }, { "cell_type": "markdown", "id": "45c1632d-fcb7-4df8-96b1-72845e08d615", "metadata": {}, "source": [ "### Analyzing Common Failure Patterns\n", "\n", "Even with high accuracy, LLM-based extraction can produce errors that reveal common challenges:\n", "\n", "- **Over-extraction:** The model extracts technically correct but contextually excessive information (e.g., including a company's former name). This pattern often requires stricter output formatting rules or more sophisticated post-processing.\n", "\n", "- **Ambiguous Layout:** The model incorrectly identifies a field because the document layout contains multiple plausible candidates (e.g., extracting a \"payment due\" amount instead of the \"invoice total\"). This class of errors is often best handled by implementing confidence scores to flag ambiguous cases for human review.\n", "\n", "To address these failure patterns, you can implement **confidence scoring** to automatically flag ambiguous layouts for human review—a critical step for high-value transactions. For issues like over-extraction, you can refine your prompts with more specific formatting instructions or provide **few-shot examples** to guide the model toward the desired output structure.\n", "\n", "These failure patterns underscore that the goal is not to eliminate human involvement, but to augment it. A successful system reliably handles the majority of invoices, while intelligently flagging complex exceptions for AP specialists to review, allowing them to focus on high-value decisions.\n", "\n", "## Next steps and upgrade paths\n", "\n", "You've built an invoice processing system that combines Llama's multimodal capabilities to handle real-world document complexity. The two-stage architecture provides a flexible foundation that can be adapted to various industries and scale requirements. Here's how to extend this system for specific business needs and scale requirements.\n", "\n", "| Invoice Type | Recommended Approach | Why |\n", "|:------------|:--------------------|:----|\n", "| **Simple receipts** (< 10 items) | Stage 1 only | Multimodal extraction suffices for straightforward layouts |\n", "| **Complex invoices** (multiple currencies) | Both stages | Stage 2 enrichment adds critical currency normalization |\n", "| **High-value transactions** (> $10K) | Both stages + confidence scoring | Add verification techniques for risk mitigation |\n", "| **Batch processing** (> 100/day) | Adaptive routing | Use confidence thresholds to route only ambiguous cases to Stage 2 |\n", "\n", "### Expanding with production tools\n", "\n", "While this tutorial uses currency conversion to demonstrate tool calling, production systems typically integrate high-impact business tools:\n", "\n", "**Vendor Validation**: `validate_vendor` checks vendors against approved supplier databases, reducing fraud risk and ensuring compliance with procurement policies.\n", "\n", "**Duplicate Detection**: `duplicate_detection` prevents double payments by comparing invoice amounts, dates, and vendor details against recent payment history.\n", "\n", "**Budget Approval**: `check_budget_approval` verifies purchases against approved budgets and spending limits, enabling automated approval workflows for compliant transactions.\n", "\n", "Each additional tool follows the same pattern: define the tool schema, implement the function, and let the Llama model decide when to use it based on invoice data and business rules." ] }, { "cell_type": "markdown", "id": "1d9db4be", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }