{ "cells": [ { "cell_type": "markdown", "id": "0e4aad87-ddd4-4b5e-a83f-63a75bd89f38", "metadata": {}, "source": [ "# PowerPoint to Knowledge-Grounded & Narrative-Aware Voiceover Transcript Generator\n", "\n", "This cookbook demonstrates the complete workflow for converting PowerPoint presentations into AI-generated voiceover transcripts with retrieval augmentation and narrative continuity features, powered by Llama 4 Maverick's vision capabilities through the Llama API.\n", "\n", "## Overview\n", "\n", "This workflow performs the following operations:\n", "\n", "1. **Content Extraction**: Pulls speaker notes and visual elements from PowerPoint slides\n", "2. **Knowledge Base Integration**: Leverages external knowledge sources to enhance transcript quality (For the purposes of this cookbook, the knowledge_base folder)\n", "3. **Image Conversion**: Transforms slides into high-quality images for analysis by Llama 4 Maverick.\n", "4. **Context-Aware Generation**: Creates natural-sounding voiceover content with narrative continuity and knowledge-based insights\n", " - **Speech Optimization**: Converts numbers, technical terms, and abbreviations to spoken form\n", "6. **Results Export**: Saves transcripts, context information, and knowledge usage statistics in multiple formats\n", "\n", "## Key Features\n", "\n", "- **Knowledge Base Integration**: Automatically retrieves relevant information from markdown knowledge files\n", "- **Unified Processor**: Single class handles both standard and narrative-aware processing with knowledge enhancement\n", "- **Configurable Context**: Adjustable context window for narrative continuity and knowledge retrieval\n", "- **Mode Selection**: Toggle between standard and narrative processing with optional knowledge integration\n", "- **Performance Optimization**: Caching and lazy loading for efficient knowledge retrieval\n", "\n", "## Prerequisites\n", "\n", "Before running this notebook, ensure you have:\n", "- Created a `.env` file with your `LLAMA_API_KEY`\n", "- Updated `config.yaml` with your presentation file path\n", "- Set up your knowledge base directory with relevant markdown files (This cookbook only supports markdown format at the moment)\n", "- Enabled knowledge base features in `config.yaml` (set `knowledge.enabled: true`)\n", "\n" ] }, { "cell_type": "markdown", "id": "b3367845-76ad-4493-a312-f80f00fad029", "metadata": {}, "source": [ "\n", "## Setup and Configuration\n", "\n", "Import required libraries and load environment configuration." ] }, { "cell_type": "code", "execution_count": 50, "id": "37249034-75bf-41bd-b640-eb6345435f47", "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "import pandas as pd\n", "import os\n", "from pathlib import Path\n", "from dotenv import load_dotenv\n", "import matplotlib.pyplot as plt\n", "from IPython.display import display" ] }, { "cell_type": "code", "execution_count": 51, "id": "0aedb2c5-5762-43ae-826b-fdb45ff642f5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SUCCESS: Environment loaded successfully!\n", "SUCCESS: GROQ API key found\n" ] } ], "source": [ "# Load environment variables from .env file\n", "load_dotenv()\n", "\n", "# Verify setup\n", "if os.getenv('GROQ_API_KEY'):\n", " print(\"SUCCESS: Environment loaded successfully!\")\n", " print(\"SUCCESS: GROQ API key found\")\n", "else:\n", " print(\"WARNING: GROQ_API_KEY not found in .env file\")\n", " print(\"Please check your .env file and add your API key\")" ] }, { "cell_type": "code", "execution_count": 52, "id": "0563bb13-9dbd-4a29-9b3b-f565befd2001", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SUCCESS: All modules imported successfully!\n", "- PPTX processor ready\n", "- Unified transcript generator ready\n", "- Configuration manager ready\n", "- Visualization generator ready\n", "- FAISS knowledge base components ready\n" ] } ], "source": [ "# Import custom modules\n", "try:\n", " from src.core.pptx_processor import extract_pptx_notes, pptx_to_images_and_notes\n", " from src.processors.unified_transcript_generator import (\n", " UnifiedTranscriptProcessor,\n", " process_slides,\n", " process_slides_with_narrative\n", " )\n", " from src.config.settings import load_config, get_config, is_knowledge_enabled\n", " from src.utils.visualization import display_slide_grid, display_slide_preview\n", "\n", " print(\"SUCCESS: All modules imported successfully!\")\n", " print(\"- PPTX processor ready\")\n", " print(\"- Unified transcript generator ready\")\n", " print(\"- Configuration manager ready\")\n", " print(\"- Visualization generator ready\")\n", "\n", " # Try to import knowledge base modules\n", " knowledge_available = False\n", " try:\n", " from src.knowledge.faiss_knowledge import FAISSKnowledgeManager\n", " from src.knowledge.context_manager import ContextManager\n", " knowledge_available = True\n", " print(\"- FAISS knowledge base components ready\")\n", " except ImportError as e:\n", " print(f\"- WARNING: Knowledge base components not available: {e}\")\n", "\n", "except ImportError as e:\n", " print(f\"ERROR: Import error: {e}\")\n", " print(\"Make sure you're running from the project root directory\")" ] }, { "cell_type": "code", "execution_count": 53, "id": "cafe366c-3ec6-47c7-8e70-ed69e89ae137", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "SUCCESS: Configuration loaded successfully!\n", "\n", "Current Settings:\n", "- Llama Model: meta-llama/llama-4-maverick-17b-128e-instruct\n", "- Image DPI: 200\n", "- Image Format: png\n", "- Context Window: 5 previous slides (default)\n", "- Knowledge Base: ENABLED\n", " - Knowledge Directory: knowledge_base\n", " - Context Strategy: combined\n", " - Knowledge Weight: 0.3\n", " - Embedding Model: all-MiniLM-L6-v2\n" ] } ], "source": [ "# Load configuration\n", "config = load_config()\n", "print(\"\\nSUCCESS: Configuration loaded successfully!\")\n", "print(\"\\nCurrent Settings:\")\n", "print(f\"- Llama Model: {config['api']['groq_model']}\")\n", "print(f\"- Image DPI: {config['processing']['default_dpi']}\")\n", "print(f\"- Image Format: {config['processing']['default_format']}\")\n", "print(f\"- Context Window: 5 previous slides (default)\")\n", "\n", "# Display knowledge base configuration\n", "knowledge_enabled = is_knowledge_enabled()\n", "print(f\"- Knowledge Base: {'ENABLED' if knowledge_enabled else 'DISABLED'}\")\n", "\n", "if knowledge_enabled:\n", " knowledge_config = config.get('knowledge', {})\n", " print(f\" - Knowledge Directory: {knowledge_config.get('knowledge_base_dir', 'knowledge_base')}\")\n", " print(f\" - Context Strategy: {knowledge_config.get('context', {}).get('strategy', 'combined')}\")\n", " print(f\" - Knowledge Weight: {knowledge_config.get('context', {}).get('knowledge_weight', 0.3)}\")\n", " print(f\" - Embedding Model: {knowledge_config.get('embedding', {}).get('model_name', 'all-MiniLM-L6-v2')}\")" ] }, { "cell_type": "markdown", "id": "dd800f7d-3ae5-4291-89d4-32d5cfca6cc7", "metadata": {}, "source": [ "#### Don't forget to update the config file with your pptx file name!\n" ] }, { "cell_type": "code", "execution_count": 54, "id": "58642e4d-cb6f-4e6f-8543-c1290a0e258d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File Configuration:\n", "- Input File: input/All About Llamas.pptx\n", "- Output Directory: output/\n", "- SUCCESS: Input file found (10.8 MB)\n", "- SUCCESS: Output directory ready\n" ] } ], "source": [ "# Configure file paths from config.yaml\n", "pptx_file = config['current_project']['pptx_file'] + config['current_project']['extension']\n", "output_dir = config['current_project']['output_dir']\n", "\n", "print(\"File Configuration:\")\n", "print(f\"- Input File: {pptx_file}\")\n", "print(f\"- Output Directory: {output_dir}\")\n", "\n", "# Verify input file exists\n", "if Path(pptx_file).exists():\n", " file_size = Path(pptx_file).stat().st_size / 1024 / 1024\n", " print(f\"- SUCCESS: Input file found ({file_size:.1f} MB)\")\n", "else:\n", " print(f\"- ERROR: Input file not found: {pptx_file}\")\n", " print(\" Please update the 'pptx_file' path in config.yaml\")\n", "\n", "# Create output directory if needed\n", "Path(output_dir).mkdir(parents=True, exist_ok=True)\n", "print(f\"- SUCCESS: Output directory ready\")" ] }, { "cell_type": "markdown", "id": "09cf9962-a9f0-4362-a72b-7c11f50772bb", "metadata": {}, "source": [ "## Knowledge Base Setup and Validation\n", "\n", "Set up and validate the knowledge base if enabled in configuration.\n" ] }, { "cell_type": "code", "execution_count": 55, "id": "e7666fa8-a4a4-4e7d-bf5d-e34ca992f9b0", "metadata": {}, "outputs": [], "source": [ "def setup_knowledge_base(config):\n", " \"\"\"Setup and validate knowledge base if enabled.\"\"\"\n", " knowledge_enabled = is_knowledge_enabled()\n", "\n", " if not knowledge_enabled:\n", " print(\"Knowledge base is disabled in configuration\")\n", " return None, None\n", "\n", " if not knowledge_available:\n", " print(\"WARNING: Knowledge base is enabled but components are not available\")\n", " return None, None\n", "\n", " print(\"Setting up knowledge base...\")\n", "\n", " knowledge_config = config.get('knowledge', {})\n", " knowledge_base_dir = Path(knowledge_config.get('knowledge_base_dir', 'knowledge_base'))\n", "\n", " # Check if knowledge base directory exists and has content\n", " if not knowledge_base_dir.exists():\n", " print(f\"Creating knowledge base directory: {knowledge_base_dir}\")\n", " knowledge_base_dir.mkdir(parents=True, exist_ok=True)\n", "\n", " # Create sample knowledge base files for demonstration\n", " create_sample_knowledge_base(knowledge_base_dir)\n", "\n", " # List existing knowledge files\n", " md_files = list(knowledge_base_dir.rglob(\"*.md\"))\n", "\n", " print(f\"Knowledge Base Status:\")\n", " print(f\"- Directory: {knowledge_base_dir}\")\n", " print(f\"- Markdown files found: {len(md_files)}\")\n", "\n", " if md_files:\n", " print(\"- Available knowledge files:\")\n", " for md_file in md_files:\n", " file_size = md_file.stat().st_size\n", " print(f\" - {md_file.name} ({file_size} bytes)\")\n", " else:\n", " print(\"- No knowledge files found\")\n", " print(\"- Creating sample knowledge base for demonstration...\")\n", " create_sample_knowledge_base(knowledge_base_dir)\n", " md_files = list(knowledge_base_dir.rglob(\"*.md\"))\n", " print(f\"- Created {len(md_files)} sample knowledge files\")\n", "\n", " # Initialize knowledge manager\n", " try:\n", " # Get FAISS configuration from config\n", " vector_config = knowledge_config.get('vector_store', {})\n", " embedding_config = knowledge_config.get('embedding', {})\n", "\n", " # Initialize FAISS knowledge manager with configuration\n", " knowledge_manager = FAISSKnowledgeManager(\n", " knowledge_base_dir=str(knowledge_base_dir),\n", " index_type=vector_config.get('index_type', 'flat'),\n", " embedding_model=embedding_config.get('model_name', 'all-MiniLM-L6-v2'),\n", " use_gpu=vector_config.get('use_gpu', False)\n", " )\n", " knowledge_manager.initialize()\n", "\n", " context_manager = ContextManager()\n", "\n", " # Display knowledge base statistics\n", " stats = knowledge_manager.get_stats()\n", " print(f\"- Knowledge chunks loaded: {stats['total_chunks']}\")\n", " print(f\"- Index type: {stats['index_type']}\")\n", " print(f\"- Embedding model: {stats['embedding_model']}\")\n", " print(f\"- Model loaded: {stats['model_loaded']}\")\n", " print(f\"- Index loaded: {stats['index_loaded']}\")\n", "\n", " return knowledge_manager, context_manager\n", "\n", " except Exception as e:\n", " print(f\"ERROR: Failed to initialize knowledge base: {e}\")\n", " import traceback\n", " traceback.print_exc()\n", " return None, None\n" ] }, { "cell_type": "code", "execution_count": 56, "id": "91f8fd6d-c142-4eb8-a72d-6640a7423af8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting up knowledge base...\n", "Knowledge Base Status:\n", "- Directory: knowledge_base\n", "- Markdown files found: 2\n", "- Available knowledge files:\n", " - llama diet.md (5762 bytes)\n", " - llamas.md (7567 bytes)\n", "- Knowledge chunks loaded: 19\n", "- Index type: flat\n", "- Embedding model: all-MiniLM-L6-v2\n", "- Model loaded: True\n", "- Index loaded: True\n" ] } ], "source": [ "# Setup knowledge base\n", "knowledge_manager, context_manager = setup_knowledge_base(config)" ] }, { "cell_type": "markdown", "id": "85c830ee-c91f-452b-987e-1652efeb326a", "metadata": {}, "source": [ "## Processing Mode Configuration\n", "\n", "Choose your processing mode and configure the processor with knowledge integration.\n" ] }, { "cell_type": "code", "execution_count": 57, "id": "290d9c7e-19db-44e0-b9c3-8973674b1010", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing Mode Configuration:\n", "- Mode: NARRATIVE CONTINUITY\n", "- Context Window: 5 previous slides\n", "- Knowledge Integration: ENABLED\n", " - Knowledge chunks available: 19\n", " - Search strategy: combined\n" ] } ], "source": [ "# Configure processing mode with knowledge integration\n", "\n", "USE_NARRATIVE = True # Set to False for standard processing, True for narrative continuity\n", "CONTEXT_WINDOW_SIZE = 5 # Number of previous slides to use as context (only used when USE_NARRATIVE=True)\n", "ENABLE_KNOWLEDGE = True # Set to False to disable knowledge base integration\n", "\n", "print(\"Processing Mode Configuration:\")\n", "if USE_NARRATIVE:\n", " print(f\"- Mode: NARRATIVE CONTINUITY\")\n", " print(f\"- Context Window: {CONTEXT_WINDOW_SIZE} previous slides\")\n", "else:\n", " print(f\"- Mode: STANDARD PROCESSING\")\n", " print(f\"- Features: Independent slide processing, faster execution\")\n", "\n", "print(f\"- Knowledge Integration: {'ENABLED' if ENABLE_KNOWLEDGE else 'DISABLED'}\")\n", "\n", "if ENABLE_KNOWLEDGE and knowledge_manager:\n", " print(f\" - Knowledge chunks available: {knowledge_manager.get_stats()['total_chunks']}\")\n", " print(f\" - Search strategy: {config.get('knowledge', {}).get('context', {}).get('strategy', 'combined')}\")\n", "\n", "# Initialize the unified processor with knowledge integration\n", "processor = UnifiedTranscriptProcessor(\n", " use_narrative=USE_NARRATIVE,\n", " context_window_size=CONTEXT_WINDOW_SIZE,\n", " enable_knowledge=ENABLE_KNOWLEDGE\n", ")" ] }, { "cell_type": "markdown", "id": "2cd7bd6d-364a-4350-9f38-b988323fcdae", "metadata": {}, "source": [ "## Processing Pipeline\n", "\n", "Execute the main processing pipeline in three key steps.\n" ] }, { "cell_type": "markdown", "id": "1ce1e223-faf0-4ab3-996d-a451bed30fc9", "metadata": {}, "source": [ "### Step 1: Extract Content and Convert to Images\n", "\n", "Extract speaker notes and slide text, then convert the presentation to high-quality images for AI analysis.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 58, "id": "db3ad12e-03d8-45cb-9999-b167d2ab93c5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROCESSING: Converting PPTX to images and extracting notes...\n", "Processing: All About Llamas.pptx\n", "Extracting speaker notes...\n", "Found notes on 10 of 10 slides\n", "Notes df saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output/All About Llamas_notes.csv\n", "Converting to PDF...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Converting to PNG images at 200 DPI...\n", "\n", "Successfully processed 10 slides\n", "Images saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output\n", "\n", "SUCCESS: Processing completed successfully!\n", "- Processed 10 slides\n", "- Images saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output\n", "- Found notes on 10 slides\n", "- DataFrame shape: (10, 8)\n", "\n", "Sample Data (First 5 slides):\n" ] }, { "data": { "text/html": [ "
| \n", " | slide_number | \n", "slide_title | \n", "has_notes | \n", "notes_word_count | \n", "slide_text_word_count | \n", "
|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "Llamas: Fascinating Animals of the Andes | \n", "True | \n", "34 | \n", "14 | \n", "
| 1 | \n", "2 | \n", "Introduction to Llamas | \n", "True | \n", "28 | \n", "25 | \n", "
| 2 | \n", "3 | \n", "Physical Characteristics | \n", "True | \n", "28 | \n", "33 | \n", "
| 3 | \n", "4 | \n", "Diet & Habitat | \n", "True | \n", "24 | \n", "23 | \n", "
| 4 | \n", "5 | \n", "Behavior & Social Structure | \n", "True | \n", "31 | \n", "30 | \n", "