{ "cells": [ { "cell_type": "markdown", "id": "6c33ba3a", "metadata": {}, "source": [ "# PowerPoint to Narrative-Aware Voiceover Transcript Generator\n", "\n", "This notebook demonstrates the complete workflow for converting PowerPoint presentations into AI-generated voiceover transcripts using the unified transcript processor with Llama 4 Maverick through the Llama API.\n", "\n", "## Overview\n", "\n", "This unified workflow performs the following operations:\n", "\n", "1. **Content Extraction**: Pulls speaker notes and visual elements from PowerPoint slides\n", "2. **Image Conversion**: Transforms slides into high-quality images for AI analysis\n", "3. **Flexible Processing**: Choose between standard or narrative-aware processing modes\n", "4. **Transcript Generation**: Creates natural-sounding voiceover content with optional narrative continuity\n", "5. **Speech Optimization**: Converts numbers, technical terms, and abbreviations to spoken form\n", "6. **Results Export**: Saves transcripts and context information in multiple formats\n", "\n", "## Key Features\n", "\n", "- **Unified Processor**: Single class handles both standard and narrative-aware processing\n", "- **Configurable Context**: Adjustable context window for narrative continuity\n", "- **Mode Selection**: Toggle between standard and narrative processing with a simple flag\n", "- **Backward Compatibility**: Maintains compatibility with existing workflows\n", "\n", "## Prerequisites\n", "\n", "Before running this notebook, ensure you have:\n", "- Created a `.env` file with your `LLAMA_API_KEY`\n", "- Updated `config.yaml` with your presentation file path\n", "---" ] }, { "cell_type": "markdown", "id": "d8965447", "metadata": {}, "source": [ "## Setup and Configuration\n", "\n", "Import required libraries and load environment configuration." ] }, { "cell_type": "code", "execution_count": 29, "id": "21a962b2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SUCCESS: Environment loaded successfully!\n", "SUCCESS: Llama API key found\n" ] } ], "source": [ "# Import required libraries\n", "import pandas as pd\n", "import os\n", "from pathlib import Path\n", "from dotenv import load_dotenv\n", "import matplotlib.pyplot as plt\n", "from IPython.display import display\n", "\n", "# Load environment variables from .env file\n", "load_dotenv()\n", "\n", "# Verify setup\n", "if os.getenv('LLAMA_API_KEY'):\n", " print(\"SUCCESS: Environment loaded successfully!\")\n", " print(\"SUCCESS: Llama API key found\")\n", "else:\n", " print(\"WARNING: LLAMA_API_KEY not found in .env file\")\n", " print(\"Please check your .env file and add your API key\")" ] }, { "cell_type": "code", "execution_count": 30, "id": "71c1c8bd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SUCCESS: All modules imported successfully!\n", "- PPTX processor ready\n", "- Unified transcript generator ready\n", "- Configuration manager ready\n" ] } ], "source": [ "# Import custom modules\n", "try:\n", " from src.core.pptx_processor import extract_pptx_notes, pptx_to_images_and_notes\n", " from src.processors.unified_transcript_generator import (\n", " UnifiedTranscriptProcessor,\n", " process_slides,\n", " process_slides_with_narrative\n", " )\n", " from src.config.settings import load_config, get_config\n", "\n", " print(\"SUCCESS: All modules imported successfully!\")\n", " print(\"- PPTX processor ready\")\n", " print(\"- Unified transcript generator ready\")\n", " print(\"- Configuration manager ready\")\n", "\n", "except ImportError as e:\n", " print(f\"ERROR: Import error: {e}\")\n", " print(\"Make sure you're running from the project root directory\")" ] }, { "cell_type": "code", "execution_count": 31, "id": "53781172", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SUCCESS: Configuration loaded successfully!\n", "\n", "Current Settings:\n", "- Llama Model: Llama-4-Maverick-17B-128E-Instruct-FP8\n", "- Image DPI: 200\n", "- Image Format: png\n", "- Context Window: 5 previous slides (default)\n" ] } ], "source": [ "# Load and display configuration\n", "config = load_config()\n", "print(\"SUCCESS: Configuration loaded successfully!\")\n", "print(\"\\nCurrent Settings:\")\n", "print(f\"- Llama Model: {config['api']['llama_model']}\")\n", "print(f\"- Image DPI: {config['processing']['default_dpi']}\")\n", "print(f\"- Image Format: {config['processing']['default_format']}\")\n", "print(f\"- Context Window: 5 previous slides (default)\")" ] }, { "cell_type": "markdown", "id": "e11ef993-f0bc-4eba-82cb-e8d4b083196e", "metadata": {}, "source": [ "#### Don't forget to update the config file with your pptx file name!" ] }, { "cell_type": "code", "execution_count": 32, "id": "9386e035", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File Configuration:\n", "- Input File: input/All About Llamas.pptx\n", "- Output Directory: output/\n", "- SUCCESS: Input file found (10.8 MB)\n", "- SUCCESS: Output directory ready\n" ] } ], "source": [ "# Configure file paths from config.yaml\n", "pptx_file = config['current_project']['pptx_file'] + config['current_project']['extension']\n", "output_dir = config['current_project']['output_dir']\n", "\n", "print(\"File Configuration:\")\n", "print(f\"- Input File: {pptx_file}\")\n", "print(f\"- Output Directory: {output_dir}\")\n", "\n", "# Verify input file exists\n", "if Path(pptx_file).exists():\n", " file_size = Path(pptx_file).stat().st_size / 1024 / 1024\n", " print(f\"- SUCCESS: Input file found ({file_size:.1f} MB)\")\n", "else:\n", " print(f\"- ERROR: Input file not found: {pptx_file}\")\n", " print(\" Please update the 'pptx_file' path in config.yaml\")\n", "\n", "# Create output directory if needed\n", "Path(output_dir).mkdir(parents=True, exist_ok=True)\n", "print(f\"- SUCCESS: Output directory ready\")" ] }, { "cell_type": "markdown", "id": "35a9e13a-4f85-488e-880b-62c7512d1248", "metadata": {}, "source": [ "## Processing Mode Configuration\n", "\n", "Choose your processing mode and configure the unified processor." ] }, { "cell_type": "code", "execution_count": 33, "id": "6fbfcb28-2f09-4497-8098-35cf3d62ebf3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing Mode Configuration:\n", "- Mode: NARRATIVE CONTINUITY\n", "- Context Window: 5 previous slides\n" ] } ], "source": [ "# Configure processing mode\n", "\n", "USE_NARRATIVE = True # Set to False for standard processing, True for narrative continuity\n", "CONTEXT_WINDOW_SIZE = 5 # Number of previous slides to use as context (only used when USE_NARRATIVE=True)\n", "\n", "print(\"Processing Mode Configuration:\")\n", "if USE_NARRATIVE:\n", " print(f\"- Mode: NARRATIVE CONTINUITY\")\n", " print(f\"- Context Window: {CONTEXT_WINDOW_SIZE} previous slides\")\n", "else:\n", " print(f\"- Mode: STANDARD PROCESSING\")\n", " print(f\"- Features: Independent slide processing, faster execution\")\n", "\n", "# Initialize the unified processor\n", "processor = UnifiedTranscriptProcessor(\n", " use_narrative=USE_NARRATIVE,\n", " context_window_size=CONTEXT_WINDOW_SIZE\n", ")" ] }, { "cell_type": "markdown", "id": "ea4851e6", "metadata": {}, "source": [ "---\n", "## Processing Pipeline\n", "\n", "Execute the main processing pipeline in three key steps." ] }, { "cell_type": "markdown", "id": "0f098fdf", "metadata": {}, "source": [ "### Step 1: Extract Content and Convert to Images\n", "\n", "Extract speaker notes and slide text, then convert the presentation to high-quality images for AI analysis." ] }, { "cell_type": "code", "execution_count": 34, "id": "644ee94c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROCESSING: Converting PPTX to images and extracting notes...\n", "Processing: All About Llamas.pptx\n", "Extracting speaker notes...\n", "Found notes on 10 of 10 slides\n", "Notes df saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output/All About Llamas_notes.csv\n", "Converting to PDF...\n", "Converting to PNG images at 200 DPI...\n", "\n", "Successfully processed 10 slides\n", "Images saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output\n", "\n", "SUCCESS: Processing completed successfully!\n", "- Processed 10 slides\n", "- Images saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output\n", "- Found notes on 10 slides\n", "- DataFrame shape: (10, 8)\n", "\n", "Sample Data (First 5 slides):\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
slide_numberslide_titlehas_notesnotes_word_countslide_text_word_count
01Llamas: Fascinating Animals of the AndesTrue3414
12Introduction to LlamasTrue2825
23Physical CharacteristicsTrue2833
34Diet & HabitatTrue2423
45Behavior & Social StructureTrue3130
\n", "
" ], "text/plain": [ " slide_number slide_title has_notes \\\n", "0 1 Llamas: Fascinating Animals of the Andes True \n", "1 2 Introduction to Llamas True \n", "2 3 Physical Characteristics True \n", "3 4 Diet & Habitat True \n", "4 5 Behavior & Social Structure True \n", "\n", " notes_word_count slide_text_word_count \n", "0 34 14 \n", "1 28 25 \n", "2 28 33 \n", "3 24 23 \n", "4 31 30 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"PROCESSING: Converting PPTX to images and extracting notes...\")\n", "\n", "result = pptx_to_images_and_notes(\n", " pptx_path=pptx_file,\n", " output_dir=output_dir,\n", " extract_notes=True\n", ")\n", "\n", "notes_df = result['notes_df']\n", "image_files = result['image_files']\n", "\n", "print(f\"\\nSUCCESS: Processing completed successfully!\")\n", "print(f\"- Processed {len(image_files)} slides\")\n", "print(f\"- Images saved to: {result['output_dir']}\")\n", "print(f\"- Found notes on {notes_df['has_notes'].sum()} slides\")\n", "print(f\"- DataFrame shape: {notes_df.shape}\")\n", "\n", "# Show sample data\n", "print(\"\\nSample Data (First 5 slides):\")\n", "display(notes_df[['slide_number', 'slide_title', 'has_notes', 'notes_word_count', 'slide_text_word_count']].head())" ] }, { "cell_type": "markdown", "id": "1f95749d", "metadata": {}, "source": [ "### Step 2: Generate Narrative-Aware AI Transcripts\n", "\n", "Use the Llama vision model to analyze each slide image and generate natural-sounding voiceover transcripts with narrative continuity.\n", "\n", "This process:\n", "- Analyzes slide visual content using AI vision\n", "- Uses transcripts from previous slides as context\n", "- Combines slide content with speaker notes\n", "- Generates speech-optimized transcripts with smooth transitions\n", "- Maintains consistent terminology throughout the presentation\n", "- Converts numbers and technical terms to spoken form" ] }, { "cell_type": "code", "execution_count": 35, "id": "fe564b99", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROCESSING: Starting AI transcript generation with unified processor...\n", "- Processing 10 slides\n", "- Using model: Llama-4-Maverick-17B-128E-Instruct-FP8\n", "- Mode: Narrative Continuity\n", "- Context window: 5 previous slides\n", "- Using previous transcripts as context for narrative continuity\n", "- This may take several minutes...\n", "Processing 10 slides with narrative continuity...\n", "Using context window of 5 previous slides\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Processing slides: 100%|████████████████████████| 10/10 [00:28<00:00, 2.88s/it]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Context information saved to: output/narrative_context\n", "\n", "SUCCESS: Transcript generation completed!\n", "- Generated 10 transcripts\n", "- Average length: 703 characters\n", "- Total words: 1,019\n", "- Context information saved to: output/narrative_context/\n", "- Average context slides used: 3.5\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print(\"PROCESSING: Starting AI transcript generation with unified processor...\")\n", "print(f\"- Processing {len(notes_df)} slides\")\n", "print(f\"- Using model: {config['api']['llama_model']}\")\n", "print(f\"- Mode: {'Narrative Continuity' if USE_NARRATIVE else 'Standard Processing'}\")\n", "if USE_NARRATIVE:\n", " print(f\"- Context window: {CONTEXT_WINDOW_SIZE} previous slides\")\n", " print(f\"- Using previous transcripts as context for narrative continuity\")\n", "print(\"- This may take several minutes...\")\n", "\n", "# Generate transcripts using the unified processor\n", "processed_df = processor.process_slides_dataframe(\n", " df=notes_df,\n", " output_dir=output_dir,\n", " save_context=True # Only saves context if USE_NARRATIVE=True\n", ")\n", "\n", "print(f\"\\nSUCCESS: Transcript generation completed!\")\n", "print(f\"- Generated {len(processed_df)} transcripts\")\n", "print(f\"- Average length: {processed_df['ai_transcript'].str.len().mean():.0f} characters\")\n", "print(f\"- Total words: {processed_df['ai_transcript'].str.split().str.len().sum():,}\")\n", "\n", "if USE_NARRATIVE:\n", " print(f\"- Context information saved to: {output_dir}narrative_context/\")\n", " print(f\"- Average context slides used: {processed_df['context_slides_used'].mean():.1f}\")" ] }, { "cell_type": "markdown", "id": "5cff4b70", "metadata": {}, "source": [ "### Step 3: Save Results\n", "\n", "Save results in multiple formats for different use cases." ] }, { "cell_type": "code", "execution_count": 37, "id": "8463ac3a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROCESSING: Saving results in multiple formats...\n", "- SUCCESS: Complete results saved to output/narrative_transcripts.csv\n", "- SUCCESS: Clean transcripts saved to output/narrative_transcripts_clean.csv\n", "- SUCCESS: JSON format saved to output/narrative_transcripts.json\n", "\n", "Export Summary:\n", "- Processing mode: Narrative Continuity\n", "- Total slides processed: 10\n", "- Slides with speaker notes: 10\n", "- Total transcript words: 1,019\n", "- Average transcript length: 703 characters\n", "- Estimated reading time: 6.8 minutes\n", "- Average context slides per slide: 3.5\n" ] } ], "source": [ "print(\"PROCESSING: Saving results in multiple formats...\")\n", "\n", "# Create output directory\n", "os.makedirs(output_dir, exist_ok=True)\n", "\n", "# Determine file prefix based on processing mode\n", "file_prefix = \"narrative\" if USE_NARRATIVE else \"standard\"\n", "\n", "# Save complete results with all metadata\n", "output_file = f\"{output_dir}{file_prefix}_transcripts.csv\"\n", "processed_df.to_csv(output_file, index=False)\n", "print(f\"- SUCCESS: Complete results saved to {output_file}\")\n", "\n", "# Save transcript-only version for voiceover work\n", "if USE_NARRATIVE:\n", " transcript_only = processed_df[['slide_number', 'slide_title', 'ai_transcript', 'context_slides_used']]\n", "else:\n", " transcript_only = processed_df[['slide_number', 'slide_title', 'ai_transcript']]\n", "\n", "transcript_file = f\"{output_dir}{file_prefix}_transcripts_clean.csv\"\n", "transcript_only.to_csv(transcript_file, index=False)\n", "print(f\"- SUCCESS: Clean transcripts saved to {transcript_file}\")\n", "\n", "# Save as JSON for API integration\n", "json_file = f\"{output_dir}{file_prefix}_transcripts.json\"\n", "processed_df.to_json(json_file, orient='records', indent=2)\n", "print(f\"- SUCCESS: JSON format saved to {json_file}\")\n", "\n", "# Summary statistics\n", "total_words = processed_df['ai_transcript'].str.split().str.len().sum()\n", "reading_time = total_words / 150 # Assuming 150 words per minute\n", "\n", "print(f\"\\nExport Summary:\")\n", "print(f\"- Processing mode: {'Narrative Continuity' if USE_NARRATIVE else 'Standard Processing'}\")\n", "print(f\"- Total slides processed: {len(processed_df)}\")\n", "print(f\"- Slides with speaker notes: {processed_df['has_notes'].sum()}\")\n", "print(f\"- Total transcript words: {total_words:,}\")\n", "print(f\"- Average transcript length: {processed_df['ai_transcript'].str.len().mean():.0f} characters\")\n", "print(f\"- Estimated reading time: {reading_time:.1f} minutes\")\n", "\n", "if USE_NARRATIVE and 'context_slides_used' in processed_df.columns:\n", " print(f\"- Average context slides per slide: {processed_df['context_slides_used'].mean():.1f}\")" ] }, { "cell_type": "markdown", "id": "8728d2ac", "metadata": {}, "source": [ "---\n", "# Completion Summary\n", "\n", "## Successfully Generated:\n", "- **Unified Processing**: Single processor handles both standard and narrative modes\n", "- **Flexible Configuration**: Easy switching between processing modes\n", "- **Speech-Optimized Transcripts**: Natural-sounding voiceover content\n", "- **Multiple Formats**: CSV, JSON exports for different use cases\n", "- **Context Analysis**: Detailed information about narrative flow (when enabled)\n", "\n", "## Output Files:\n", "- `[mode]_transcripts.csv` - Complete dataset with metadata\n", "- `[mode]_transcripts_clean.csv` - Clean transcripts for voiceover work\n", "- `[mode]_transcripts.json` - JSON format for API integration\n", "- `narrative_context/` - Context analysis files (narrative mode only)\n", "- Individual slide images in PNG/JPEG format\n", "\n", "## Processing Modes:\n", "\n", "### Standard Mode (`USE_NARRATIVE = False`)\n", "- **Best for**: Simple presentations, quick processing, independent slides\n", "- **Features**: Fast execution, no context dependencies\n", "- **Use cases**: Training materials, product demos, standalone slides\n", "\n", "### Narrative Mode (`USE_NARRATIVE = True`)\n", "- **Best for**: Story-driven presentations, complex topics, educational content\n", "- **Features**: Context awareness, smooth transitions, terminology consistency\n", "- **Use cases**: Conference talks, educational courses, marketing presentations\n", "\n", "## Next Steps:\n", "1. **Review** generated transcripts for accuracy and flow\n", "2. **Edit** any content that needs refinement\n", "3. **Create** voiceover recordings or use TTS systems\n", "4. **Integrate** JSON data into your video production workflow\n", "5. **Experiment** with different processing modes for optimal results\n", "\n", "## Tips for Better Results:\n", "- **Rich Speaker Notes**: Detailed notes improve transcript quality in both modes\n", "- **Clear Visuals**: High-contrast slides with readable text work best\n", "- **Mode Selection**: Use narrative mode for complex presentations, standard for simple ones\n", "- **Context Window**: Adjust context window size (3-7 slides) based on presentation complexity\n", "- **Consistent Style**: Maintain consistent formatting across your presentation\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "7122cdf6-667e-4ae4-8ce7-67cfc32577c8", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "promptTesting", "language": "python", "name": "prompttesting" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.2" } }, "nbformat": 4, "nbformat_minor": 5 }