README.md 9.7 KB

PowerPoint to Voiceover Transcript

A Llama 4 powered solution that converts PowerPoint presentations into text-to-speech ready voiceover transcripts. Designed for creating professional narration content from slide decks.

Overview

This system extracts speaker notes and visual content from PowerPoint files, then uses the Llama 4 Maverick model to generate natural-sounding transcripts optimized for human voiceover or text-to-speech systems. The generated transcripts include proper pronunciation of technical terms, numbers, and model names.

Key Features

  • AI-Powered Analysis: Uses Llama 4 Maverick to understand slide content and context
  • Unified Processing: Single processor handles both standard and narrative-aware modes
  • Narrative Continuity: Optional context-aware processing maintains smooth transitions
  • Speech Optimization: Converts numbers, decimals, and technical terms to spoken form
  • Visualization Tools: Built-in utilities for displaying slide images in Jupyter notebooks
  • Flexible Configuration: Toggle between processing modes with simple flags
  • Cross-Platform: Works on Windows, macOS, and Linux
  • Production Ready: Comprehensive error handling, progress tracking, and retry logic

Quick Start

Prerequisites

  • Python 3.12+
  • LibreOffice (for PPTX conversion)
  • Llama API key

Installation

Option 1: Using uv (Recommended - Faster)

  1. Install uv (if not already installed): ```bash

    macOS/Linux

    curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip pip install uv


2. **Clone and install dependencies:**
   ```bash
   git clone <repository-url>
   cd powerpoint-to-voiceover-transcript
   uv sync
  1. Activate the virtual environment:

    source .venv/bin/activate  # macOS/Linux
    # or
    .venv\Scripts\activate     # Windows
    

Option 2: Using pip (Traditional)

  1. Clone and install dependencies:

    git clone https://github.com/meta-llama/llama-cookbook.git
    cd powerpoint-to-voiceover-transcript
    pip install -e .
    
  2. Install LibreOffice:

    • macOS: brew install --cask libreoffice
    • Ubuntu: sudo apt-get install libreoffice
    • Windows: Download from libreoffice.org
  3. Set up environment:

    cp .env.example .env
    # Edit .env and add your LLAMA_API_KEY
    
  4. Configure your presentation:

    # Edit config.yaml - update the pptx_file path
    current_project:
     pptx_file: "input/your_presentation_name"
     extension: ".pptx"
    

Basic Usage

Narrative Continuity Workflow

For presentations requiring smooth narrative flow and consistent terminology:

jupyter notebook narrative_continuity_workflow.ipynb

This workflow uses previous slide transcripts as context to maintain narrative continuity and ensure smooth transitions between slides. Features include:

  • Context-aware processing: Uses 5 previous slides as context by default
  • Consistent terminology: Maintains terminology consistency throughout the presentation
  • Smooth transitions: Generates natural flow between slides
  • Enhanced output: Includes narrative context analysis and relationship mapping

Or use the Python API:

from src.core.pptx_processor import pptx_to_images_and_notes
from src.processors.unified_transcript_generator import UnifiedTranscriptProcessor

# Convert PPTX and extract notes
result = pptx_to_images_and_notes("presentation.pptx", "output/")

# Generate transcripts
processor = UnifiedTranscriptProcessor()
transcripts = processor.process_slides_dataframe(result['notes_df'], "output/")

# Save results
transcripts.to_csv("transcripts.csv", index=False)

Project Structure

powerpoint-to-voiceover-transcript/
├── README.md                          # This file
├── config.yaml                        # Main configuration
├── pyproject.toml                     # Dependencies and project metadata
├── uv.lock                            # uv dependency lock file
├── narrative_continuity_workflow.ipynb # Narrative-aware workflow
├── .env.example                       # Environment template
├── input/                             # Place your PPTX files here
├── output/                            # Generated images and transcripts
└── src/
    ├── config/
    │   └── settings.py                # Configuration management
    ├── core/
    │   ├── file_utils.py              # File system utilities
    │   ├── image_processing.py        # Image encoding for API
    │   ├── llama_client.py            # Llama API integration
    │   └── pptx_processor.py          # PPTX extraction and conversion
    ├── processors/
    │   └── unified_transcript_generator.py # Unified processor (standard + narrative)
    └── utils/
        └── visualization.py           # Slide image display utilities

Configuration

The system uses config.yaml for settings:

# API Configuration
api:
  llama_model: "Llama-4-Maverick-17B-128E-Instruct-FP8"
  max_retries: 3

# Processing Settings
processing:
  default_dpi: 200
  default_format: "png"
  batch_size: 5

# Your Project
current_project:
  pptx_file: "input/your_presentation"
  extension: ".pptx"
  output_dir: "output/"

API Reference

Core Functions

pptx_to_images_and_notes(pptx_path, output_dir)

Converts PowerPoint to images and extracts speaker notes.

Returns: Dictionary with image_files, notes_df, and output_dir

UnifiedTranscriptProcessor(use_narrative=True, context_window_size=5)

Main class for generating AI transcripts with configurable processing modes.

Parameters:

  • use_narrative (bool): Enable narrative continuity mode (default: True)
  • context_window_size (int): Number of previous slides to use as context (default: 5)

Methods:

  • process_slides_dataframe(df, output_dir, save_context=True) - Process all slides
  • process_single_slide(image_path, speaker_notes, slide_number, slide_title) - Process one slide

Processing Modes

Standard Mode (use_narrative=False)

  • Best for: Simple presentations, quick processing, independent slides
  • Features: Fast execution, no context dependencies
  • Use cases: Training materials, product demos, standalone slides

Narrative Mode (use_narrative=True)

  • Best for: Story-driven presentations, complex topics, educational content
  • Features: Context awareness, smooth transitions, terminology consistency
  • Use cases: Conference talks, educational courses, marketing presentations

Visualization Utilities

display_slide_grid(image_files, max_cols=3, figsize_per_image=(4, 3))

Display slide images in a grid layout for Jupyter notebooks.

Parameters:

  • image_files (List): List of image file paths
  • max_cols (int): Maximum columns in grid (default: 3)
  • figsize_per_image (Tuple): Size of each image as (width, height) (default: (4, 3))

Example:

from src.utils.visualization import display_slide_grid, display_slide_preview

# Display first 6 slides in a 3-column grid
display_slide_grid(image_files[:6], max_cols=3, figsize_per_image=(4, 3))

# Or use the convenience function
display_slide_preview(image_files, num_slides=6, max_cols=3)

display_slide_preview(image_files, num_slides=6, max_cols=3, figsize_per_image=(4, 3))

Display a preview of the first N slide images with automatic grid layout.

Speech Optimization

The AI automatically converts technical content for natural speech:

  • Decimals: 3.2 → "three dot two"
  • Model names: LLaMA-3.2 → "LLaMA three dot two"
  • Abbreviations: LLM → "L L M"
  • Large numbers: 70B → "seventy billion"

Add your own rules in the system prompt.

Requirements

System Dependencies

  • LibreOffice: Required for PPTX to PDF conversion
  • Python 3.12+: Core runtime

Python Dependencies

  • pandas>=2.3.1 - Data processing
  • python-pptx>=1.0.2 - PowerPoint file handling
  • pymupdf>=1.24.0 - PDF to image conversion
  • llama-api-client>=0.1.0 - AI model access
  • pillow>=11.3.0 - Image processing
  • pyyaml>=6.0.0 - Configuration management
  • matplotlib>=3.5.0 - Visualization utilities

See pyproject.toml for complete dependency list.

Output

Narrative Continuity Workflow Output

Enhanced output includes:

  1. Narrative-Aware Transcripts: Context-aware voiceover content with smooth transitions
  2. Context Analysis: Information about how previous slides influenced each transcript
  3. Narrative Summary: Overall analysis of presentation flow and consistency
  4. Multiple Formats: CSV, JSON exports with context information
  5. Context Files: Detailed narrative context data for each slide
  6. Visual Preview: Grid display of slide images for verification

Troubleshooting

Common Issues

"LibreOffice not found"

  • Install LibreOffice or update paths in config.yaml

"API key not found"

  • Set LLAMA_API_KEY in your .env file

"Permission denied"

  • Ensure write permissions to output directories

"Invalid image format"

  • Use supported formats: png, jpeg, jpg

"uv sync fails"

  • Make sure you have Python 3.12+ installed
  • Try uv python install 3.12 to install Python via uv

"Context window too large"

  • Reduce context_window_size parameter in narrative workflow
  • Default is 5 slides, try 3 for shorter presentations

"Images not displaying in notebook"

  • Ensure matplotlib is installed: pip install matplotlib
  • Check that image files exist in the output directory
  • Try restarting the Jupyter kernel