PowerPoint to Voiceover Transcript

A Llama 4 powered solution that converts PowerPoint presentations into text-to-speech ready voiceover transcripts. Designed for creating professional narration content from slide decks.

Overview

This system extracts speaker notes and visual content from PowerPoint files, then uses the Llama 4 Maverick model to generate natural-sounding transcripts optimized for human voiceover or text-to-speech systems. The generated transcripts include proper pronunciation of technical terms, numbers, and model names.

Key Features

AI-Powered Analysis: Uses Llama 4 Maverick to understand slide content and context
Unified Processing: Single processor handles both standard and narrative-aware modes
Narrative Continuity: Optional context-aware processing maintains smooth transitions
Speech Optimization: Converts numbers, decimals, and technical terms to spoken form
Visualization Tools: Built-in utilities for displaying slide images in Jupyter notebooks
Flexible Configuration: Toggle between processing modes with simple flags
Cross-Platform: Works on Windows, macOS, and Linux
Production Ready: Comprehensive error handling, progress tracking, and retry logic

Quick Start

Prerequisites

Python 3.12+
LibreOffice (for PPTX conversion)
Llama API key

Installation

Option 1: Using uv (Recommended - Faster)

Install uv (if not already installed): ```bash

macOS/Linux

curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip pip install uv


2. **Clone and install dependencies:**
   ```bash
   git clone <repository-url>
   cd powerpoint-to-voiceover-transcript
   uv sync

Activate the virtual environment:

source .venv/bin/activate  # macOS/Linux
# or
.venv\Scripts\activate     # Windows

Option 2: Using pip (Traditional)

Clone and install dependencies:

git clone https://github.com/meta-llama/llama-cookbook.git
cd powerpoint-to-voiceover-transcript
pip install -e .

Install LibreOffice:
- macOS: brew install --cask libreoffice
- Ubuntu: sudo apt-get install libreoffice
- Windows: Download from libreoffice.org

Set up environment:

cp .env.example .env
# Edit .env and add your LLAMA_API_KEY

Configure your presentation:

# Edit config.yaml - update the pptx_file path
current_project:
 pptx_file: "input/your_presentation_name"
 extension: ".pptx"

Basic Usage

Narrative Continuity Workflow

For presentations requiring smooth narrative flow and consistent terminology:

jupyter notebook narrative_continuity_workflow.ipynb

This workflow uses previous slide transcripts as context to maintain narrative continuity and ensure smooth transitions between slides. Features include:

Context-aware processing: Uses 5 previous slides as context by default
Consistent terminology: Maintains terminology consistency throughout the presentation
Smooth transitions: Generates natural flow between slides
Enhanced output: Includes narrative context analysis and relationship mapping

Or use the Python API:

from src.core.pptx_processor import pptx_to_images_and_notes
from src.processors.unified_transcript_generator import UnifiedTranscriptProcessor

# Convert PPTX and extract notes
result = pptx_to_images_and_notes("presentation.pptx", "output/")

# Generate transcripts
processor = UnifiedTranscriptProcessor()
transcripts = processor.process_slides_dataframe(result['notes_df'], "output/")

# Save results
transcripts.to_csv("transcripts.csv", index=False)

Project Structure

powerpoint-to-voiceover-transcript/
├── README.md                          # This file
├── config.yaml                        # Main configuration
├── pyproject.toml                     # Dependencies and project metadata
├── uv.lock                            # uv dependency lock file
├── narrative_continuity_workflow.ipynb # Narrative-aware workflow
├── .env.example                       # Environment template
├── input/                             # Place your PPTX files here
├── output/                            # Generated images and transcripts
└── src/
    ├── config/
    │   └── settings.py                # Configuration management
    ├── core/
    │   ├── file_utils.py              # File system utilities
    │   ├── image_processing.py        # Image encoding for API
    │   ├── llama_client.py            # Llama API integration
    │   └── pptx_processor.py          # PPTX extraction and conversion
    ├── processors/
    │   └── unified_transcript_generator.py # Unified processor (standard + narrative)
    └── utils/
        └── visualization.py           # Slide image display utilities

Configuration

The system uses config.yaml for settings:

# API Configuration
api:
  llama_model: "Llama-4-Maverick-17B-128E-Instruct-FP8"
  max_retries: 3

# Processing Settings
processing:
  default_dpi: 200
  default_format: "png"
  batch_size: 5

# Your Project
current_project:
  pptx_file: "input/your_presentation"
  extension: ".pptx"
  output_dir: "output/"

API Reference

Core Functions

`pptx_to_images_and_notes(pptx_path, output_dir)`

Converts PowerPoint to images and extracts speaker notes.

Returns: Dictionary with image_files, notes_df, and output_dir

`UnifiedTranscriptProcessor(use_narrative=True, context_window_size=5)`

Main class for generating AI transcripts with configurable processing modes.

Parameters:

use_narrative (bool): Enable narrative continuity mode (default: True)
context_window_size (int): Number of previous slides to use as context (default: 5)

Methods:

process_slides_dataframe(df, output_dir, save_context=True) - Process all slides
process_single_slide(image_path, speaker_notes, slide_number, slide_title) - Process one slide

Processing Modes

Standard Mode (`use_narrative=False`)

Best for: Simple presentations, quick processing, independent slides
Features: Fast execution, no context dependencies
Use cases: Training materials, product demos, standalone slides

Narrative Mode (`use_narrative=True`)

Best for: Story-driven presentations, complex topics, educational content
Features: Context awareness, smooth transitions, terminology consistency
Use cases: Conference talks, educational courses, marketing presentations

Visualization Utilities

`display_slide_grid(image_files, max_cols=3, figsize_per_image=(4, 3))`

Display slide images in a grid layout for Jupyter notebooks.

Parameters:

image_files (List): List of image file paths
max_cols (int): Maximum columns in grid (default: 3)
figsize_per_image (Tuple): Size of each image as (width, height) (default: (4, 3))

Example:

from src.utils.visualization import display_slide_grid, display_slide_preview

# Display first 6 slides in a 3-column grid
display_slide_grid(image_files[:6], max_cols=3, figsize_per_image=(4, 3))

# Or use the convenience function
display_slide_preview(image_files, num_slides=6, max_cols=3)

`display_slide_preview(image_files, num_slides=6, max_cols=3, figsize_per_image=(4, 3))`

Display a preview of the first N slide images with automatic grid layout.

Speech Optimization

The AI automatically converts technical content for natural speech:

Decimals: 3.2 → "three dot two"
Model names: LLaMA-3.2 → "LLaMA three dot two"
Abbreviations: LLM → "L L M"
Large numbers: 70B → "seventy billion"

Add your own rules in the system prompt.

Requirements

System Dependencies

LibreOffice: Required for PPTX to PDF conversion
Python 3.12+: Core runtime

Python Dependencies

pandas>=2.3.1 - Data processing
python-pptx>=1.0.2 - PowerPoint file handling
pymupdf>=1.24.0 - PDF to image conversion
llama-api-client>=0.1.0 - AI model access
pillow>=11.3.0 - Image processing
pyyaml>=6.0.0 - Configuration management
matplotlib>=3.5.0 - Visualization utilities

See pyproject.toml for complete dependency list.

Output

Narrative Continuity Workflow Output

Enhanced output includes:

Narrative-Aware Transcripts: Context-aware voiceover content with smooth transitions
Context Analysis: Information about how previous slides influenced each transcript
Narrative Summary: Overall analysis of presentation flow and consistency
Multiple Formats: CSV, JSON exports with context information
Context Files: Detailed narrative context data for each slide
Visual Preview: Grid display of slide images for verification

Troubleshooting

Common Issues

"LibreOffice not found"

Install LibreOffice or update paths in config.yaml

"API key not found"

Set LLAMA_API_KEY in your .env file

"Permission denied"

Ensure write permissions to output directories

"Invalid image format"

Use supported formats: png, jpeg, jpg

"uv sync fails"

Make sure you have Python 3.12+ installed
Try uv python install 3.12 to install Python via uv

"Context window too large"

Reduce context_window_size parameter in narrative workflow
Default is 5 slides, try 3 for shorter presentations

"Images not displaying in notebook"

Ensure matplotlib is installed: pip install matplotlib
Check that image files exist in the output directory
Try restarting the Jupyter kernel

README.md 9.7 KB 文件历史 原始文件