# PowerPoint to Voiceover Transcript A Llama 4 powered solution that converts PowerPoint presentations into text-to-speech ready voiceover transcripts. Designed for creating professional narration content from slide decks. ## Overview This system extracts speaker notes and visual content from PowerPoint files, then uses the Llama 4 Maverick model to generate natural-sounding transcripts optimized for human voiceover or text-to-speech systems. The generated transcripts include proper pronunciation of technical terms, numbers, and model names. ### Key Features - **AI-Powered Analysis**: Uses Llama 4 Maverick to understand slide content and context - **Unified Processing**: Single processor handles both standard and narrative-aware modes - **Narrative Continuity**: Optional context-aware processing maintains smooth transitions - **Speech Optimization**: Converts numbers, decimals, and technical terms to spoken form - **Visualization Tools**: Built-in utilities for displaying slide images in Jupyter notebooks - **Flexible Configuration**: Toggle between processing modes with simple flags - **Cross-Platform**: Works on Windows, macOS, and Linux - **Production Ready**: Comprehensive error handling, progress tracking, and retry logic ## Quick Start ### Prerequisites - Python 3.12+ - LibreOffice (for PPTX conversion) - Llama API key ### Installation #### Option 1: Using uv (Recommended - Faster) 1. **Install uv (if not already installed):** ```bash # macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Or via pip pip install uv ``` 2. **Clone and install dependencies:** ```bash git clone cd powerpoint-to-voiceover-transcript uv sync ``` 3. **Activate the virtual environment:** ```bash source .venv/bin/activate # macOS/Linux # or .venv\Scripts\activate # Windows ``` #### Option 2: Using pip (Traditional) 1. **Clone and install dependencies:** ```bash git clone https://github.com/meta-llama/llama-cookbook.git cd powerpoint-to-voiceover-transcript pip install -e . ``` 2. **Install LibreOffice:** - **macOS**: `brew install --cask libreoffice` - **Ubuntu**: `sudo apt-get install libreoffice` - **Windows**: Download from [libreoffice.org](https://www.libreoffice.org/download/) 3. **Set up environment:** ```bash cp .env.example .env # Edit .env and add your LLAMA_API_KEY ``` 4. **Configure your presentation:** ```bash # Edit config.yaml - update the pptx_file path current_project: pptx_file: "input/your_presentation_name" extension: ".pptx" ``` ### Basic Usage #### Narrative Continuity Workflow For presentations requiring smooth narrative flow and consistent terminology: ```bash jupyter notebook narrative_continuity_workflow.ipynb ``` This workflow uses previous slide transcripts as context to maintain narrative continuity and ensure smooth transitions between slides. Features include: - **Context-aware processing**: Uses 5 previous slides as context by default - **Consistent terminology**: Maintains terminology consistency throughout the presentation - **Smooth transitions**: Generates natural flow between slides - **Enhanced output**: Includes narrative context analysis and relationship mapping Or use the Python API: ```python from src.core.pptx_processor import pptx_to_images_and_notes from src.processors.unified_transcript_generator import UnifiedTranscriptProcessor # Convert PPTX and extract notes result = pptx_to_images_and_notes("presentation.pptx", "output/") # Generate transcripts processor = UnifiedTranscriptProcessor() transcripts = processor.process_slides_dataframe(result['notes_df'], "output/") # Save results transcripts.to_csv("transcripts.csv", index=False) ``` ## Project Structure ``` powerpoint-to-voiceover-transcript/ ├── README.md # This file ├── config.yaml # Main configuration ├── pyproject.toml # Dependencies and project metadata ├── uv.lock # uv dependency lock file ├── narrative_continuity_workflow.ipynb # Narrative-aware workflow ├── .env.example # Environment template ├── input/ # Place your PPTX files here ├── output/ # Generated images and transcripts └── src/ ├── config/ │ └── settings.py # Configuration management ├── core/ │ ├── file_utils.py # File system utilities │ ├── image_processing.py # Image encoding for API │ ├── llama_client.py # Llama API integration │ └── pptx_processor.py # PPTX extraction and conversion ├── processors/ │ └── unified_transcript_generator.py # Unified processor (standard + narrative) └── utils/ └── visualization.py # Slide image display utilities ``` ## Configuration The system uses `config.yaml` for settings: ```yaml # API Configuration api: llama_model: "Llama-4-Maverick-17B-128E-Instruct-FP8" max_retries: 3 # Processing Settings processing: default_dpi: 200 default_format: "png" batch_size: 5 # Your Project current_project: pptx_file: "input/your_presentation" extension: ".pptx" output_dir: "output/" ``` ## API Reference ### Core Functions #### `pptx_to_images_and_notes(pptx_path, output_dir)` Converts PowerPoint to images and extracts speaker notes. **Returns:** Dictionary with `image_files`, `notes_df`, and `output_dir` #### `UnifiedTranscriptProcessor(use_narrative=True, context_window_size=5)` Main class for generating AI transcripts with configurable processing modes. **Parameters:** - `use_narrative` (bool): Enable narrative continuity mode (default: True) - `context_window_size` (int): Number of previous slides to use as context (default: 5) **Methods:** - `process_slides_dataframe(df, output_dir, save_context=True)` - Process all slides - `process_single_slide(image_path, speaker_notes, slide_number, slide_title)` - Process one slide ### Processing Modes #### Standard Mode (`use_narrative=False`) - **Best for**: Simple presentations, quick processing, independent slides - **Features**: Fast execution, no context dependencies - **Use cases**: Training materials, product demos, standalone slides #### Narrative Mode (`use_narrative=True`) - **Best for**: Story-driven presentations, complex topics, educational content - **Features**: Context awareness, smooth transitions, terminology consistency - **Use cases**: Conference talks, educational courses, marketing presentations ### Visualization Utilities #### `display_slide_grid(image_files, max_cols=3, figsize_per_image=(4, 3))` Display slide images in a grid layout for Jupyter notebooks. **Parameters:** - `image_files` (List): List of image file paths - `max_cols` (int): Maximum columns in grid (default: 3) - `figsize_per_image` (Tuple): Size of each image as (width, height) (default: (4, 3)) **Example:** ```python from src.utils.visualization import display_slide_grid, display_slide_preview # Display first 6 slides in a 3-column grid display_slide_grid(image_files[:6], max_cols=3, figsize_per_image=(4, 3)) # Or use the convenience function display_slide_preview(image_files, num_slides=6, max_cols=3) ``` #### `display_slide_preview(image_files, num_slides=6, max_cols=3, figsize_per_image=(4, 3))` Display a preview of the first N slide images with automatic grid layout. ### Speech Optimization The AI automatically converts technical content for natural speech: - **Decimals**: `3.2` → "three dot two" - **Model names**: `LLaMA-3.2` → "LLaMA three dot two" - **Abbreviations**: `LLM` → "L L M" - **Large numbers**: `70B` → "seventy billion" Add your own rules in the system prompt. ## Requirements ### System Dependencies - **LibreOffice**: Required for PPTX to PDF conversion - **Python 3.12+**: Core runtime ### Python Dependencies - `pandas>=2.3.1` - Data processing - `python-pptx>=1.0.2` - PowerPoint file handling - `pymupdf>=1.24.0` - PDF to image conversion - `llama-api-client>=0.1.0` - AI model access - `pillow>=11.3.0` - Image processing - `pyyaml>=6.0.0` - Configuration management - `matplotlib>=3.5.0` - Visualization utilities See `pyproject.toml` for complete dependency list. ## Output ### Narrative Continuity Workflow Output Enhanced output includes: 1. **Narrative-Aware Transcripts**: Context-aware voiceover content with smooth transitions 2. **Context Analysis**: Information about how previous slides influenced each transcript 3. **Narrative Summary**: Overall analysis of presentation flow and consistency 4. **Multiple Formats**: CSV, JSON exports with context information 5. **Context Files**: Detailed narrative context data for each slide 6. **Visual Preview**: Grid display of slide images for verification ## Troubleshooting ### Common Issues **"LibreOffice not found"** - Install LibreOffice or update paths in `config.yaml` **"API key not found"** - Set `LLAMA_API_KEY` in your `.env` file **"Permission denied"** - Ensure write permissions to output directories **"Invalid image format"** - Use supported formats: `png`, `jpeg`, `jpg` **"uv sync fails"** - Make sure you have Python 3.12+ installed - Try `uv python install 3.12` to install Python via uv **"Context window too large"** - Reduce `context_window_size` parameter in narrative workflow - Default is 5 slides, try 3 for shorter presentations **"Images not displaying in notebook"** - Ensure matplotlib is installed: `pip install matplotlib` - Check that image files exist in the output directory - Try restarting the Jupyter kernel ---