# Metadata Extraction from Invoice Documents

*Copyright (c) Meta Platforms, Inc. and affiliates.
This software may be used and distributed according to the terms of the Llama Community License Agreement.*

"Open

The tutorial shows you how to build an invoice processing system that automatically extracts structured data from invoice images. Using a two-stage pipeline with Llama's multimodal and text models, you'll transform varied invoice formats into clean, validated JSON data ready for seamless import into accounting systems.

While traditional OCR tools struggle with diverse layouts and require extensive template configuration, the approach here uses Llama's vision capabilities to understand any invoice format, enrich data with external services, and flag exceptions for human review—delivering the accuracy required for financial automation.

## What you will learn

- **Build a two-stage processing pipeline** that separates visual extraction from intelligent refinement for optimal accuracy and cost efficiency.
- **Leverage Llama's multimodal capabilities** to extract data from invoice images with diverse layouts and formats.
- **Use Llama API's tool calling** to enrich data with external services such as currency conversion APIs.
- **Implement JSON structured output** to ensure consistent, reliable data extraction every time.

| Component | Choice | Why |
|:----------|:-------|:----|
| **Architecture** | Two-Stage Pipeline | Separates concerns: Stage 1 focuses on accurate transcription, Stage 2 on refinement |
| **Stage 1 Model** | Llama 4 Maverick | Advanced vision capabilities for accurate text extraction from complex invoice layouts |
| **Stage 2 Model** | Llama 4 Scout | Fast performance and tool calling for data refinement, validation, and enrichment |
| **Infrastructure** | Llama API | Provides serverless, production-ready access to Llama models using the `llama_api_client` software development kit (SDK) |
| **Output Format** | JSON Structured Output | Guarantees consistent schema compliance for seamless integration with accounting systems |

---

**Note on Inference Providers:** This tutorial uses Llama API for demonstration purposes. However, you can run Llama models with any preferred inference provider. Common examples include [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) and [Together AI](https://together.ai/llama). The core logic of this tutorial can be adapted to any of these providers.

## Scenario snapshot

- **Corpus:** The **Scanned Receipts OCR and Information Extraction (SROIE) dataset**, publicly available on [Kaggle](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2). The dataset contains real-world receipts and invoices featuring diverse layouts, fonts, and formats that realistically simulate the challenges an AP department faces. Documents range from simple receipts to complex invoices with varying quality—from pristine digital exports to faded scans.

**Example Invoice from SROIE Dataset:**

![Sample Invoice](data/invoice_img/X51005230648.jpg)

*Sample Malaysian invoice showing typical extraction challenges: foreign currency (RM), mixed date formats (29/01/2018), vendor information, and varying text quality.*

- **Users:** The target users are **Accounts Payable (AP) specialists** and **accounting clerks** whose primary responsibility is to process incoming invoices for payment accurately and efficiently. They typically spend 3-5 minutes per invoice on manual data entry, leading to processing bottlenecks and human errors.

- **Typical Asks:** The core task is to extract structured data from each invoice document. A typical extraction task would be:
 - Extract the following from the invoice: vendor_name, invoice_date, vendor_address, total_amount, and currency_symbol
 - Convert any foreign currency amounts to USD using current exchange rates
 - Flag any invoices with missing required fields or validation errors for manual review

## Solution: Two-stage intelligent processing pipeline

The solution is a two-stage pipeline that optimizes for both accuracy and cost by using different Llama models specialized for each stage's task:

```mermaid
flowchart TD
 A[Invoice Image] --> B[Stage 1: Accurate Transcription]
 B -- Raw JSON with Ambiguities --> C[Stage 2: Intelligent Refinement]
 C -- Enriched & Validated Data --> E[Final Structured JSON]
 
 subgraph "Multimodal Llama Model"
 B
 end
 
 subgraph "Text Refinement"
 C
 end
 
 D[External Tools, e.g., Currency API] <--> C
```

### Stage 1: Accurate transcription (vision to raw data)

The first stage acts as the system's "eyes," focusing entirely on converting visual information into structured text with maximum accuracy.

**Key Strategy**: The model is instructed to **extract, not invent**. It captures any visual ambiguities in the `extraction_notes` field (e.g., "Currency symbol appears to be 'RM' but could be 'SR' due to print quality"), creating a raw but faithful digital representation with documented uncertainties.

### Stage 2: Intelligent refinement (data to insights)

The second stage acts as the system's "brain," applying business logic, external knowledge, and validation rules to produce clean, reliable data.

The separation of concerns enables you to use expensive multimodal models only for visual tasks while leveraging faster, cheaper text models for data refinement—reducing costs by up to 60% compared to using multimodal models throughout.

## Prerequisites

Before you begin, ensure you have a Llama API key from [Llama API](https://llama.developer.meta.com/).

## Install dependencies

You will need a few libraries for this project: `llama-api-client` for API access and `pillow` for image handling.

In [None]:
# Install dependencies
!pip install --quiet llama-api-client pillow

## Imports & Llama API client setup

Import the necessary modules and initialize the `LlamaAPIClient` using your API key as an environment variable.

> **Note:** This tutorial uses Llama API, but you can adapt it to other inference providers such as [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) and [Together AI](https://together.ai/llama).

In [1]:
import os, sys, json
import base64
import textwrap
from pathlib import Path
from typing import Dict, List, Optional, Any
from datetime import datetime
from io import BytesIO
from PIL import Image
from pydantic import BaseModel, Field
from llama_api_client import LlamaAPIClient

# --- Llama client ---
API_KEY = os.getenv("LLAMA_API_KEY")
if not API_KEY:
 sys.exit("❌ Please set the LLAMA_API_KEY environment variable.")

client = LlamaAPIClient(api_key=API_KEY)

### Model configuration

The tutorial uses two specialized Llama models, optimized for different tasks: Llama 4 Maverick for multimodal input processing (text + images), while Llama 4 Scout for fast text performance with tool calling capabilities.

In [None]:
# --- Constants & Configuration ---
STAGE1_MODEL = "Llama-4-Maverick-17B-128E-Instruct-FP8"
STAGE2_MODEL = "Llama-4-Scout-17B-16E-Instruct-FP8"
MAX_COMPLETION_TOKENS = 2000 # Max tokens for model completion

# Setting a token limit is a best practice for controlling costs and ensuring 
# predictable performance. This value acts as a safeguard, preventing runaway 
# requests while being high enough to handle complex invoices.

## Load sample invoice data

The tutorial uses a small subset of the **SROIE dataset** (Scanned Receipts OCR and Information Extraction), containing real-world receipts and invoices with diverse layouts.

> The full [SROIE dataset](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2) contains thousands of invoices with varying quality—from pristine digital exports to faded scans.

In [3]:
# Path to SROIE dataset samples
DATA_DIR = Path("data")
INVOICE_IMG_DIR = DATA_DIR / "invoice_img"
INVOICE_JSON_DIR = DATA_DIR / "invoice_json"

In [6]:
def load_sroie_data():
 """Load SROIE dataset invoice images and ground truth."""
 invoice_images = sorted(INVOICE_IMG_DIR.glob("*.jpg"))
 
 invoices = []
 for img_path in invoice_images: # Load all invoice images
 json_path = INVOICE_JSON_DIR / f"{img_path.stem}.txt"
 
 ground_truth = {}
 if json_path.exists():
 with open(json_path, 'r') as f:
 ground_truth = json.loads(f.read())
 
 invoices.append({
 "image_path": str(img_path),
 "filename": img_path.name,
 "ground_truth": ground_truth
 })
 
 print(f"✅ {len(invoices)} invoices loaded")
 return invoices

def load_invoice_image(image_path: str) -> str:
 """Load and encode invoice image for API calls."""
 with open(image_path, "rb") as img_file:
 return base64.b64encode(img_file.read()).decode('utf-8')

# Load SROIE dataset
sroie_invoices = load_sroie_data()

✅ 10 invoices loaded


## Stage 1 - Visual extraction with multimodal model

**Stage 1 uses Llama 4 Maverick's multimodal capabilities** for accurate text extraction from complex invoice layouts. The key is to be **faithful to the source**—capturing exactly what's visible, including any ambiguities.

### Define the invoice schema

We'll use Pydantic models to define our expected output structure, ensuring consistent extraction across all invoices.

In [7]:
# --- Pydantic Models for Structured Output ---
class RawInvoiceData(BaseModel):
 """Model for Stage 1 raw extraction output - aligned with SROIE dataset."""
 vendor_name: str = Field(description="Company that issued the invoice")
 invoice_date: str = Field(
 description="Date invoice was issued (preserve original format)")
 vendor_address: str = Field(description="Vendor address as shown on invoice")
 total_amount: str = Field(
 description="Total amount as numeric string only (e.g., '123.45', no currency symbols)")
 currency_symbol: str = Field(
 description="Currency symbol or code found (e.g., 'RM', '$', 'USD')")
 extraction_notes: str = Field(
 description="Any visual ambiguities, unclear text, or alternative interpretations observed")

### Implement visual extraction

**Prompt strategy**: Instructs the model to act as a high-fidelity transcriber, separating numeric amounts from currency symbols and documenting visual uncertainties in `extraction_notes` for Stage 2 resolution.

In [None]:
def stage1_visual_extraction(image_path: str) -> Dict:
 """
 Stage 1: Extract raw data from invoice image using multimodal model.
 Focus: accurate transcription without interpretation.
 """
 
 system_prompt = """Extract invoice information from the image with perfect accuracy.

EXTRACTION RULES:
1. Extract ONLY what you clearly see - use "UNCLEAR" or "NOT_FOUND" if uncertain
2. Preserve original date formats (e.g., "15/01/2019")
3. total_amount: numeric value only (e.g., "123.45") - NO symbols or formatting
4. currency_symbol: symbol/code only (e.g., "RM", "$", "USD")
5. extraction_notes: document visual uncertainties and alternative interpretations

EXAMPLES of extraction_notes:
- "Currency symbol appears to be 'RM' but could be 'SR' due to print quality"
- "Date format ambiguous - could be DD/MM or MM/DD"
- "Vendor name partially obscured"

Output using the provided JSON schema."""

 # Load and encode image
 base64_image = load_invoice_image(image_path)
 
 user_content = [
 {
 "type": "text",
 "text": "Extract all information from this invoice image."
 },
 {
 "type": "image_url",
 "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
 }
 ]
 
 response_format = {
 "type": "json_schema",
 "json_schema": {
 "name": RawInvoiceData.__name__,
 "schema": RawInvoiceData.model_json_schema(),
 },
 }
 
 try:
 response = client.chat.completions.create(
 model=STAGE1_MODEL,
 messages=[
 {"role": "system", "content": system_prompt},
 {"role": "user", "content": user_content}
 ],
 temperature=0.1,
 max_completion_tokens=MAX_COMPLETION_TOKENS,
 response_format=response_format
 )
 result = json.loads(response.completion_message.content.text)
 
 return {'success': True, 'raw_data': result}
 
 except Exception as e:
 return {'success': False, 'error': str(e)}

Now let's process all SROIE invoices through Stage 1:

In [None]:
from typing import Any, Dict, Optional

def normalize_company_name(name: Optional[str]) -> str:
 """Normalize company name for consistent comparison."""
 return (name or "").upper().strip()

def normalize_amount(amount: Any) -> str:
 """Normalize amount for consistent comparison by removing symbols and whitespace."""
 amount_str = str(amount or "")
 return amount_str.replace('$', '').replace('RM', '').strip()

def evaluate_extraction(data: Dict, gt: Dict, amount_field: str = 'total_amount') -> tuple[bool, bool]:
 """Shared evaluation logic for both stages using normalization helpers."""
 company_norm = normalize_company_name(data.get('vendor_name'))
 gt_company_norm = normalize_company_name(gt.get('company'))
 
 amount_str = normalize_amount(data.get(amount_field))
 gt_amount_str = normalize_amount(gt.get('total'))
 
 return company_norm == gt_company_norm, amount_str == gt_amount_str

def print_comparison(i, filename, data, gt, company_match, total_match, 
 stage_name, amount_field='total_amount', extra_info=None):
 """Shared printing logic for both stages."""
 print(f"[{i}] {filename}")
 print(f"{stage_name}: {data.get('vendor_name')} | {data.get(amount_field)}")
 print(f"Target: {gt.get('company')} | {gt.get('total')}")
 overall_match = company_match and total_match
 print(f"{'✅' if overall_match else '❌'} {'✓' if overall_match else '✗'} "
 f"(Company:{'✓' if company_match else '✗'} | Amount:{'✓' if total_match else '✗'})")
 if extra_info: print(extra_info)
 print()

In [None]:
print("🔍 Processing invoices through Stage 1...")

stage1_results = []
for i, invoice in enumerate(sroie_invoices, 1):
 result = stage1_visual_extraction(invoice['image_path'])
 
 if result['success']:
 raw_data, gt = result['raw_data'], invoice['ground_truth']
 
 # Normalize for comparison
 company_match, total_match = evaluate_extraction(raw_data, gt)
 
 print(f"[{i}] {invoice['filename']}")
 print(f"Extracted: {raw_data.get('vendor_name')} | {raw_data.get('total_amount')}")
 print(f"Target {gt.get('company')} | {gt.get('total')}")
 if raw_data.get('extraction_notes') and raw_data['extraction_notes'].strip():
 print(f"📝 Notes: {raw_data['extraction_notes']}")
 overall_match = company_match and total_match
 print(f"{'✅' if overall_match else '❌'} {'✓' if overall_match else '✗'} "
 f"(Company:{'✓' if company_match else '✗'} | "
 f"Amount:{'✓' if total_match else '✗'})\n")
 
 stage1_results.append({
 'invoice': invoice, 'result': result,
 'company_match': company_match, 'total_match': total_match
 })
 else:
 print(f"[{i}] {invoice['filename']} ❌ FAILED\n")
 stage1_results.append({
 'invoice': invoice, 'result': result,
 'company_match': False, 'total_match': False
 })

successful = [r for r in stage1_results if r['result']['success']]
company_accuracy = (sum(1 for r in successful if r['company_match']) / 
 len(successful) if successful else 0)
total_accuracy = (sum(1 for r in successful if r['total_match']) / 
 len(successful) if successful else 0)

print(f"✅ Stage 1: {len(successful)}/{len(sroie_invoices)} processed | "
 f"Accuracy: {company_accuracy:.1%} company, {total_accuracy:.1%} amount")

🔍 Processing invoices through Stage 1...
[1] X00016469670.jpg
Extracted: OJC MARKETING SDN BHD | 193.00
Target OJC MARKETING SDN BHD | 193.00
📝 Notes: The currency symbol 'SR' is used, which typically represents Saudi Riyal. The invoice is clearly marked as a 'TAX INVOICE' and includes details such as invoice number, date, cashier, sales person, and bill to information. The product details and total amount are also clearly listed.
✅ ✓ (Company:✓ | Amount:✓)

[2] X00016469671.jpg
Extracted: OJC MARKETING SDN BHD | 170.00
Target OJC MARKETING SDN BHD | 170.00
📝 Notes: The currency symbol is not explicitly shown on the invoice, but the amounts are listed with two decimal places, suggesting a currency that uses this format, such as MYR (Malaysian Ringgit). The vendor is based in Malaysia, supporting this interpretation.
✅ ✓ (Company:✓ | Amount:✓)

[3] X51005200931.jpg
Extracted: PERNIAGAAN ZHENG HUI | 436.20
Target PERNIAGAAN ZHENG HUI | 436.20
📝 Notes: The invoice is clear and legible, wi

## Stage 2 - Intelligent refinement with tool calling

**Stage 2 uses Llama 4 Scout** for fast performance and tool calling, applying business logic to resolve ambiguities and enrich data with external services. This is where the system becomes truly intelligent by resolving ambiguities and enriching data with external information.

### Define tools for external services

We'll create tools that the model can use to enrich the extracted data. These tools enable the system to perform currency conversion and data enrichment.

> **Learn more about tool calling**: For comprehensive guidance on implementing tool calling with Llama models, see the [Meta Tool Calling Guide](https://llama.developer.meta.com/docs/features/tool-calling/).

**Tool strategy**: In this tutorial we use currency conversion to demonstrate tool calling with clear purpose and structured data handling. This same pattern extends to other tools such as vendor validation, tax calculation, compliance checks, and other business logic integrations.

**Implementation note**: The currency conversion implementation below uses static exchange rates for tutorial simplicity. In production, you would integrate with live currency APIs such as ExchangeRate-API, Fixer.io, or your financial system's currency service.

In [12]:
# --- Tool Definitions ---
def get_currency_conversion_tool():
 """Define the currency conversion tool for Stage 2."""
 return {
 "type": "function",
 "function": {
 "name": "convert_currency",
 "description": "Convert amount from one currency to USD using static exchange rates",
 "parameters": {
 "type": "object",
 "properties": {
 "amount": {
 "type": "number",
 "description": "The amount to convert"
 },
 "from_currency": {
 "type": "string",
 "description": "Source currency code (MYR, EUR, GBP, SGD)"
 },
 "to_currency": {
 "type": "string",
 "description": "Target currency code (USD only)"
 }
 },
 "required": ["amount", "from_currency", "to_currency"]
 }
 }
 }

In [13]:
def convert_currency(amount: float, from_currency: str, to_currency: str = "USD") -> Dict:
 """Convert currency amounts using static exchange rates (tutorial implementation)."""
 
 # Static rates for tutorial (August 2024)
 rates = {"MYR": 0.21, "EUR": 1.09, "GBP": 1.27, "SGD": 0.74}
 
 # Validate inputs
 if not isinstance(amount, (int, float)) or amount < 0:
 return {"error": "Amount must be a non-negative number"}
 
 if from_currency == to_currency:
 return {
 "converted_amount": float(amount), 
 "exchange_rate": 1.0, 
 "note": "No conversion needed"
 }
 
 if from_currency in rates and to_currency == "USD":
 converted = amount * rates[from_currency]
 return {
 "converted_amount": round(converted, 2),
 "exchange_rate": rates[from_currency],
 "source_currency": from_currency,
 "target_currency": to_currency
 }
 
 return {"error": f"Currency conversion from {from_currency} to {to_currency} not supported"}

In [14]:
def execute_tool(tool_name: str, arguments: Dict) -> Dict:
 """Execute the requested tool and return results."""
 
 if tool_name == "convert_currency":
 try:
 return convert_currency(
 amount=arguments["amount"],
 from_currency=arguments["from_currency"], 
 to_currency=arguments["to_currency"]
 )
 except KeyError as e:
 return {"error": f"Missing required argument: {e}"}
 except (TypeError, ValueError) as e:
 return {"error": f"Invalid argument type: {e}"}
 
 return {"error": f"Unknown tool: {tool_name}"}

### Define enriched output schema

The enriched schema structures Stage 1's raw extraction into clean, business-ready data with currency conversion and processing transparency.

In [15]:
class EnrichedInvoiceData(BaseModel):
 """Model for Stage 2 enriched output with currency conversion."""
 vendor_name: str = Field(description="Vendor company name")
 vendor_address: str = Field(description="Vendor address")
 invoice_date: str = Field(
 description="Invoice date in ISO format (YYYY-MM-DD)")
 original_amount: str = Field(
 description="Original amount as numeric string (e.g., '123.45', no symbols)")
 original_currency: str = Field(
 description="Original currency symbol (e.g., 'MYR', 'USD')")
 converted_amount_usd: str = Field(
 description="USD amount as numeric string (e.g., '25.89', no symbols)")
 exchange_rate: str = Field(
 description="Exchange rate as numeric string (e.g., '0.21')")
 reasoning_notes: str = Field(description="AI reasoning summary")

### Implement intelligent refinement

**Processing strategy**: We use Llama 4 Scout with structured JSON output and tool calling to enrich the raw data from Stage 1, resolve currency ambiguities, standardize dates, and convert amounts to USD. The model analyzes `extraction_notes` to resolve documented ambiguities using business context, standardizes dates, follows strict numeric formatting rules, and converts amounts to USD.

In [None]:
def stage2_intelligent_refinement(raw_data: Dict) -> Dict:
 """Stage 2: Refine and enrich raw extracted data using tool calling."""
 
 # Validate input
 if not isinstance(raw_data, dict):
 return {'success': False, 'error': 'Invalid input: raw_data must be a dictionary', 'stage': 2}
 
 required_fields = ['vendor_name', 'total_amount', 'currency_symbol']
 missing_fields = [field for field in required_fields if field not in raw_data]
 if missing_fields:
 return {'success': False, 'error': f'Missing required fields: {missing_fields}', 'stage': 2}
 
 try:
 tools = [get_currency_conversion_tool()]
 response_format = {
 "type": "json_schema",
 "json_schema": {
 "name": EnrichedInvoiceData.__name__,
 "schema": EnrichedInvoiceData.model_json_schema(),
 },
 }
 
 refinement_prompt = f"""Analyze the raw invoice data below and enrich it with currency conversion.

Raw data from Stage 1:
{json.dumps(raw_data, indent=2)}

TASKS:
1. Extract original_amount from total_amount (numeric string only)
2. Resolve currency ambiguities using extraction_notes + context clues
3. Standardize date to YYYY-MM-DD format
4. Use convert_currency tool for non-USD amounts
5. Document reasoning in reasoning_notes

AMBIGUITY RESOLUTION:
- Use extraction_notes to identify uncertainties from Stage 1
- Attempt to resolve uncertainties based on the information you have
- Apply context clues: Malaysian addresses suggest MYR currency
- Reference extraction_notes findings in your reasoning_notes

OUTPUT FORMATTING (numeric strings only):
- original_amount: "123.45" (no symbols)
- converted_amount_usd: "25.89" (no symbols)
- exchange_rate: "0.21" (no symbols)
- original_currency: "MYR" (code only, not "RM")
"""

 messages = [{"role": "user", "content": refinement_prompt}]
 
 response = client.chat.completions.create(
 model=STAGE2_MODEL,
 messages=messages,
 tools=tools,
 temperature=0.3,
 max_completion_tokens=MAX_COMPLETION_TOKENS,
 response_format=response_format
 )
 
 # Handle tool calls if the model wants to use them
 if response.completion_message.tool_calls:
 # Add assistant's response to conversation
 messages.append({
 "role": "assistant",
 "tool_calls": [{
 "id": call.id,
 "function": {
 "name": call.function.name,
 "arguments": call.function.arguments
 }
 } for call in response.completion_message.tool_calls]
 })
 
 # Execute tools and add results
 for tool_call in response.completion_message.tool_calls:
 try:
 arguments = json.loads(tool_call.function.arguments)
 tool_result = execute_tool(tool_call.function.name, arguments)
 
 messages.append({
 "role": "tool",
 "tool_call_id": tool_call.id,
 "content": json.dumps(tool_result)
 })
 except json.JSONDecodeError:
 return {'success': False, 'error': 'Invalid tool arguments JSON', 'stage': 2}
 
 # Get final structured response
 messages.append({"role": "user", "content": 
 "Provide the complete enriched invoice data following the required schema. "
 "Include conversion details in converted_amount_usd and exchange_rate fields."})
 
 final_response = client.chat.completions.create(
 model=STAGE2_MODEL,
 messages=messages,
 temperature=0.1,
 max_completion_tokens=MAX_COMPLETION_TOKENS,
 response_format=response_format
 )
 
 try:
 enriched_data = json.loads(final_response.completion_message.content.text)
 except json.JSONDecodeError:
 return {'success': False, 'error': 'Invalid JSON response from final call', 'stage': 2}
 else:
 # No tools needed, parse direct response
 try:
 enriched_data = json.loads(response.completion_message.content.text)
 except json.JSONDecodeError:
 return {'success': False, 'error': 'Invalid JSON response from initial call', 'stage': 2}
 
 # Validate required fields in enriched data
 required_enriched_fields = ['vendor_name', 'original_amount']
 missing_enriched = [field for field in required_enriched_fields if field not in enriched_data]
 if missing_enriched:
 return {'success': False, 'error': f'Missing enriched fields: {missing_enriched}', 'stage': 2}
 
 return {
 'success': True,
 'stage': 2,
 'enriched_data': enriched_data
 }
 
 except Exception as e:
 return {
 'success': False,
 'error': str(e),
 'stage': 2
 }

Now let's process the successful Stage 1 results through Stage 2:

In [17]:
print("🧠 Processing invoices through Stage 2 (intelligent refinement)...")

stage2_results = []
successful_stage1 = [r for r in stage1_results if r['result']['success']]

for i, stage1_result in enumerate(successful_stage1, 1):
 invoice = stage1_result['invoice']
 result = stage2_intelligent_refinement(stage1_result['result']['raw_data'])
 
 if result['success']:
 enriched, gt = result['enriched_data'], invoice['ground_truth']
 company_match, total_match = evaluate_extraction(enriched, gt, 'original_amount')
 
 # Currency conversion info
 extra_info = None
 if enriched.get('converted_amount_usd') and enriched.get('original_currency') not in ['USD', '$']:
 curr = enriched.get('original_currency', 'Unknown')
 usd = enriched.get('converted_amount_usd', 'N/A')
 extra_info = f"💱 Converted: {curr} → USD ${usd}"
 
 print_comparison(i, invoice['filename'], enriched, gt, company_match, total_match, "Enriched", 'original_amount', extra_info)
 
 stage2_results.append({
 'invoice': invoice, 'stage1_result': stage1_result, 'stage2_result': result,
 'company_match': company_match, 'total_match': total_match
 })
 else:
 print(f"[{i}] {invoice['filename']} ❌ FAILED: {result.get('error', 'Unknown error')}\n")
 stage2_results.append({
 'invoice': invoice, 'stage1_result': stage1_result, 'stage2_result': result,
 'company_match': False, 'total_match': False
 })

successful_stage2 = [r for r in stage2_results if r['stage2_result']['success']]
company_accuracy = sum(1 for r in successful_stage2 if r['company_match']) / len(successful_stage2) if successful_stage2 else 0
total_accuracy = sum(1 for r in successful_stage2 if r['total_match']) / len(successful_stage2) if successful_stage2 else 0
conversions = [r for r in successful_stage2 if r['stage2_result']['enriched_data'].get('converted_amount_usd')]

print(f"✅ Stage 2: {len(successful_stage2)}/{len(successful_stage1)} enriched | "
 f"Accuracy: {company_accuracy:.1%} company, {total_accuracy:.1%} amount")
print(f" Currency conversions: {len(conversions)} invoices")

🧠 Processing invoices through Stage 2 (intelligent refinement)...
[1] X00016469670.jpg
Enriched: OJC MARKETING SDN BHD | 193.00
Target: OJC MARKETING SDN BHD | 193.00
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $46.09

[2] X00016469671.jpg
Enriched: OJC MARKETING SDN BHD | 170.00
Target: OJC MARKETING SDN BHD | 170.00
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $40.09

[3] X51005200931.jpg
Enriched: PERNIAGAAN ZHENG HUI | 436.20
Target: PERNIAGAAN ZHENG HUI | 436.20
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $97.53

[4] X51005230605.jpg
Enriched: PETRON BKT LANJAN SB | 4.90
Target: PETRON BKT LANJAN SB | 4.90
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $1.16

[5] X51005230616.jpg
Enriched: Gerbang Alaf Restaurants Sdn Bhd (formerly known as Golden Arches Restaurants Sdn Bhd) | 38.90
Target: GERBANG ALAF RESTAURANTS SDN BHD | 38.90
❌ ✗ (Company:✗ | Amount:✓)
💱 Converted: MYR → USD $,{

[6] X51005230621.jpg
Enriched: SIN LIANHAP SDN BHD | 7.30
Target: SIN LIANHAP

Let's examine the final structured outputs from our two-stage pipeline:

In [18]:
print("📋 Final Structured Outputs from Two-Stage Pipeline:\n")

for i, result in enumerate(successful_stage2, 1):
 invoice = result['invoice']
 enriched_data = result['stage2_result']['enriched_data']
 
 print(f"[{i}] {invoice['filename']}:")
 print(json.dumps(enriched_data, indent=2))
 print("-" * 50)

📋 Final Structured Outputs from Two-Stage Pipeline:

[1] X00016469670.jpg:
{
 "vendor_name": "OJC MARKETING SDN BHD",
 "vendor_address": "NO 2 & 4, JALAN BAYU 4, BANDAR SERI ALAM, 81750 MASAI, JOHOR",
 "invoice_date": "2019-01-15",
 "original_amount": "193.00",
 "original_currency": "MYR",
 "converted_amount_usd": "46.09",
 "exchange_rate": "0.2387",
 "reasoning_notes": "The currency symbol 'SR' was initially provided, but based on the vendor address in Malaysia and the extraction notes, it seems there was a confusion. The address suggests the currency is likely MYR. The amount 193.00 was converted from MYR to USD using the exchange rate 0.2387, resulting in 46.09 USD."
}
--------------------------------------------------
[2] X00016469671.jpg:
{
 "vendor_name": "OJC MARKETING SDN BHD",
 "vendor_address": "NO 2 & 4, JALAN BAYU 4, BANDAR SERI ALAM, 81750 MASAI, JOHOR",
 "invoice_date": "2019-02-01",
 "original_amount": "170.00",
 "original_currency": "MYR",
 "converted_amount_usd": "40.0

### Analyzing Common Failure Patterns

Even with high accuracy, LLM-based extraction can produce errors that reveal common challenges:

- **Over-extraction:** The model extracts technically correct but contextually excessive information (e.g., including a company's former name). This pattern often requires stricter output formatting rules or more sophisticated post-processing.

- **Ambiguous Layout:** The model incorrectly identifies a field because the document layout contains multiple plausible candidates (e.g., extracting a "payment due" amount instead of the "invoice total"). This class of errors is often best handled by implementing confidence scores to flag ambiguous cases for human review.

To address these failure patterns, you can implement **confidence scoring** to automatically flag ambiguous layouts for human review—a critical step for high-value transactions. For issues like over-extraction, you can refine your prompts with more specific formatting instructions or provide **few-shot examples** to guide the model toward the desired output structure.

These failure patterns underscore that the goal is not to eliminate human involvement, but to augment it. A successful system reliably handles the majority of invoices, while intelligently flagging complex exceptions for AP specialists to review, allowing them to focus on high-value decisions.

## Next steps and upgrade paths

You've built an invoice processing system that combines Llama's multimodal capabilities to handle real-world document complexity. The two-stage architecture provides a flexible foundation that can be adapted to various industries and scale requirements. Here's how to extend this system for specific business needs and scale requirements.

| Invoice Type | Recommended Approach | Why |
|:------------|:--------------------|:----|
| **Simple receipts** (< 10 items) | Stage 1 only | Multimodal extraction suffices for straightforward layouts |
| **Complex invoices** (multiple currencies) | Both stages | Stage 2 enrichment adds critical currency normalization |
| **High-value transactions** (> $10K) | Both stages + confidence scoring | Add verification techniques for risk mitigation |
| **Batch processing** (> 100/day) | Adaptive routing | Use confidence thresholds to route only ambiguous cases to Stage 2 |

### Expanding with production tools

While this tutorial uses currency conversion to demonstrate tool calling, production systems typically integrate high-impact business tools:

**Vendor Validation**: `validate_vendor` checks vendors against approved supplier databases, reducing fraud risk and ensuring compliance with procurement policies.

**Duplicate Detection**: `duplicate_detection` prevents double payments by comparing invoice amounts, dates, and vendor details against recent payment history.

**Budget Approval**: `check_budget_approval` verifies purchases against approved budgets and spending limits, enabling automated approval workflows for compliant transactions.

Each additional tool follows the same pattern: define the tool schema, implement the function, and let the Llama model decide when to use it based on invoice data and business rules.