|
|
преди 3 месеца | |
|---|---|---|
| .. | ||
| src | преди 3 месеца | |
| .env.example | преди 3 месеца | |
| .gitignore | преди 3 месеца | |
| README.md | преди 3 месеца | |
| config.yaml | преди 3 месеца | |
| pptx_to_vo_workflow.ipynb | преди 3 месеца | |
| pyproject.toml | преди 3 месеца | |
| uv.lock | преди 3 месеца | |
A production-ready tool that converts PowerPoint presentations into AI-generated voiceover transcripts using Meta's Llama vision models. Designed for creating professional narration content from slide decks.
This system extracts speaker notes and visual content from PowerPoint files, then uses advanced AI vision models to generate natural-sounding transcripts optimized for human voiceover or text-to-speech systems. The generated transcripts include proper pronunciation of technical terms, numbers, and model names.
Install uv (if not already installed): ```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or via pip pip install uv
2. **Clone and install dependencies:**
```bash
git clone <repository-url>
cd powerpoint-to-voiceover-transcript
uv sync
Activate the virtual environment:
source .venv/bin/activate # macOS/Linux
# or
.venv\Scripts\activate # Windows
Clone and install dependencies:
git clone https://github.com/meta-llama/llama-cookbook.git
cd powerpoint-to-voiceover-transcript
pip install -e .
Install LibreOffice:
brew install --cask libreofficesudo apt-get install libreofficeSet up environment:
cp .env.example .env
# Edit .env and add your LLAMA_API_KEY
Configure your presentation:
# Edit config.yaml - update the pptx_file path
current_project:
pptx_file: "input/your_presentation_name"
extension: ".pptx"
Run the main workflow notebook:
jupyter notebook pptx_to_vo_transcript.ipynb
Or use the Python API:
from src.core.pptx_processor import pptx_to_images_and_notes
from src.processors.transcript_generator import TranscriptProcessor
# Convert PPTX and extract notes
result = pptx_to_images_and_notes("presentation.pptx", "output/")
# Generate transcripts
processor = TranscriptProcessor()
transcripts = processor.process_slides_dataframe(result['notes_df'], "output/")
# Save results
transcripts.to_csv("transcripts.csv", index=False)
powerpoint-to-voiceover-transcript/
├── README.md # This file
├── config.yaml # Main configuration
├── pyproject.toml # Dependencies and project metadata
├── uv.lock # uv dependency lock file
├── pptx_to_vo_transcript.ipynb # Main workflow notebook
├── .env.example # Environment template
├── input/ # Place your PPTX files here
└── src/
├── config/
│ └── settings.py # Configuration management
├── core/
│ ├── file_utils.py # File system utilities
│ ├── image_processing.py # Image encoding for API
│ ├── llama_client.py # Llama API integration
│ └── pptx_processor.py # PPTX extraction and conversion
└── processors/
└── transcript_generator.py # AI transcript generation
The system uses config.yaml for settings:
# API Configuration
api:
llama_model: "Llama-4-Maverick-17B-128E-Instruct-FP8"
max_retries: 3
# Processing Settings
processing:
default_dpi: 200
default_format: "png"
batch_size: 5
# Your Project
current_project:
pptx_file: "input/your_presentation"
extension: ".pptx"
output_dir: "output/"
pptx_to_images_and_notes(pptx_path, output_dir)Converts PowerPoint to images and extracts speaker notes.
Returns: Dictionary with image_files, notes_df, and output_dir
TranscriptProcessor()Main class for generating AI transcripts.
Methods:
process_slides_dataframe(df, output_dir) - Process all slidesprocess_single_slide(image_path, speaker_notes) - Process one slideThe AI automatically converts technical content for natural speech:
3.2 → "three dot two"LLaMA-3.2 → "LLaMA three dot two"LLM → "L L M"70B → "seventy billion"pandas>=2.3.1 - Data processingpython-pptx>=1.0.2 - PowerPoint file handlingpymupdf>=1.24.0 - PDF to image conversionllama-api-client>=0.1.0 - AI model accesspillow>=11.3.0 - Image processingpyyaml>=6.0.0 - Configuration managementSee pyproject.toml for complete dependency list.
The system generates:
"LibreOffice not found"
config.yaml"API key not found"
LLAMA_API_KEY in your .env file"Permission denied"
"Invalid image format"
png, jpeg, jpg"uv sync fails"
uv python install 3.12 to install Python via uv