Yuce Dincer 554a838f5f initial commit of pptx to vo преди 3 месеца
..
src 554a838f5f initial commit of pptx to vo преди 3 месеца
.env.example 554a838f5f initial commit of pptx to vo преди 3 месеца
.gitignore 554a838f5f initial commit of pptx to vo преди 3 месеца
README.md 554a838f5f initial commit of pptx to vo преди 3 месеца
config.yaml 554a838f5f initial commit of pptx to vo преди 3 месеца
pptx_to_vo_workflow.ipynb 554a838f5f initial commit of pptx to vo преди 3 месеца
pyproject.toml 554a838f5f initial commit of pptx to vo преди 3 месеца
uv.lock 554a838f5f initial commit of pptx to vo преди 3 месеца

README.md

PowerPoint to Voiceover Transcript

A production-ready tool that converts PowerPoint presentations into AI-generated voiceover transcripts using Meta's Llama vision models. Designed for creating professional narration content from slide decks.

Overview

This system extracts speaker notes and visual content from PowerPoint files, then uses advanced AI vision models to generate natural-sounding transcripts optimized for human voiceover or text-to-speech systems. The generated transcripts include proper pronunciation of technical terms, numbers, and model names.

Key Features

  • AI-Powered Analysis: Uses Llama vision models to understand slide content and context
  • Speech Optimization: Converts numbers, decimals, and technical terms to spoken form
  • Flexible Processing: Supports both individual slides and batch processing
  • Cross-Platform: Works on Windows, macOS, and Linux
  • Production Ready: Comprehensive error handling, progress tracking, and retry logic

Quick Start

Prerequisites

  • Python 3.12+
  • LibreOffice (for PPTX conversion)
  • Llama API key

Installation

Option 1: Using uv (Recommended - Faster)

  1. Install uv (if not already installed): ```bash

    macOS/Linux

    curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip pip install uv


2. **Clone and install dependencies:**
   ```bash
   git clone <repository-url>
   cd powerpoint-to-voiceover-transcript
   uv sync
  1. Activate the virtual environment:

    source .venv/bin/activate  # macOS/Linux
    # or
    .venv\Scripts\activate     # Windows
    

Option 2: Using pip (Traditional)

  1. Clone and install dependencies:

    git clone https://github.com/meta-llama/llama-cookbook.git
    cd powerpoint-to-voiceover-transcript
    pip install -e .
    
  2. Install LibreOffice:

    • macOS: brew install --cask libreoffice
    • Ubuntu: sudo apt-get install libreoffice
    • Windows: Download from libreoffice.org
  3. Set up environment:

    cp .env.example .env
    # Edit .env and add your LLAMA_API_KEY
    
  4. Configure your presentation:

    # Edit config.yaml - update the pptx_file path
    current_project:
     pptx_file: "input/your_presentation_name"
     extension: ".pptx"
    

Basic Usage

Run the main workflow notebook:

jupyter notebook pptx_to_vo_transcript.ipynb

Or use the Python API:

from src.core.pptx_processor import pptx_to_images_and_notes
from src.processors.transcript_generator import TranscriptProcessor

# Convert PPTX and extract notes
result = pptx_to_images_and_notes("presentation.pptx", "output/")

# Generate transcripts
processor = TranscriptProcessor()
transcripts = processor.process_slides_dataframe(result['notes_df'], "output/")

# Save results
transcripts.to_csv("transcripts.csv", index=False)

Project Structure

powerpoint-to-voiceover-transcript/
├── README.md                     # This file
├── config.yaml                   # Main configuration
├── pyproject.toml                # Dependencies and project metadata
├── uv.lock                       # uv dependency lock file
├── pptx_to_vo_transcript.ipynb   # Main workflow notebook
├── .env.example                  # Environment template
├── input/                        # Place your PPTX files here
└── src/
    ├── config/
    │   └── settings.py           # Configuration management
    ├── core/
    │   ├── file_utils.py         # File system utilities
    │   ├── image_processing.py   # Image encoding for API
    │   ├── llama_client.py       # Llama API integration
    │   └── pptx_processor.py     # PPTX extraction and conversion
    └── processors/
        └── transcript_generator.py # AI transcript generation

Configuration

The system uses config.yaml for settings:

# API Configuration
api:
  llama_model: "Llama-4-Maverick-17B-128E-Instruct-FP8"
  max_retries: 3

# Processing Settings
processing:
  default_dpi: 200
  default_format: "png"
  batch_size: 5

# Your Project
current_project:
  pptx_file: "input/your_presentation"
  extension: ".pptx"
  output_dir: "output/"

API Reference

Core Functions

pptx_to_images_and_notes(pptx_path, output_dir)

Converts PowerPoint to images and extracts speaker notes.

Returns: Dictionary with image_files, notes_df, and output_dir

TranscriptProcessor()

Main class for generating AI transcripts.

Methods:

  • process_slides_dataframe(df, output_dir) - Process all slides
  • process_single_slide(image_path, speaker_notes) - Process one slide

Speech Optimization

The AI automatically converts technical content for natural speech:

  • Decimals: 3.2 → "three dot two"
  • Model names: LLaMA-3.2 → "LLaMA three dot two"
  • Abbreviations: LLM → "L L M"
  • Large numbers: 70B → "seventy billion"

Requirements

System Dependencies

  • LibreOffice: Required for PPTX to PDF conversion
  • Python 3.12+: Core runtime

Python Dependencies

  • pandas>=2.3.1 - Data processing
  • python-pptx>=1.0.2 - PowerPoint file handling
  • pymupdf>=1.24.0 - PDF to image conversion
  • llama-api-client>=0.1.0 - AI model access
  • pillow>=11.3.0 - Image processing
  • pyyaml>=6.0.0 - Configuration management

See pyproject.toml for complete dependency list.

Output

The system generates:

  1. Slide Images: High-resolution PNG/JPEG files
  2. Notes DataFrame: Structured data with slide metadata
  3. AI Transcripts: Speech-optimized voiceover content
  4. CSV Export: Complete results for further processing

Troubleshooting

Common Issues

"LibreOffice not found"

  • Install LibreOffice or update paths in config.yaml

"API key not found"

  • Set LLAMA_API_KEY in your .env file

"Permission denied"

  • Ensure write permissions to output directories

"Invalid image format"

  • Use supported formats: png, jpeg, jpg

"uv sync fails"

  • Make sure you have Python 3.12+ installed
  • Try uv python install 3.12 to install Python via uv

Ready to convert your presentations to professional voiceover content! 🎙️