|
|
преди 3 месеца | |
|---|---|---|
| .. | ||
| input | преди 3 месеца | |
| output | преди 3 месеца | |
| src | преди 3 месеца | |
| .env.example | преди 3 месеца | |
| .gitignore | преди 3 месеца | |
| README.md | преди 3 месеца | |
| config.yaml | преди 3 месеца | |
| narrative_continuity_workflow.ipynb | преди 3 месеца | |
| pyproject.toml | преди 3 месеца | |
| uv.lock | преди 3 месеца | |
A Llama 4 powered solution that converts PowerPoint presentations into text-to-speech ready voiceover transcripts. Designed for creating professional narration content from slide decks.
This system extracts speaker notes and visual content from PowerPoint files, then uses the Llama 4 Maverick model to generate natural-sounding transcripts optimized for human voiceover or text-to-speech systems. The generated transcripts include proper pronunciation of technical terms, numbers, and model names.
Install uv (if not already installed): ```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or via pip pip install uv
2. **Clone and install dependencies:**
```bash
git clone <repository-url>
cd powerpoint-to-voiceover-transcript
uv sync
Activate the virtual environment:
source .venv/bin/activate # macOS/Linux
# or
.venv\Scripts\activate # Windows
Clone and install dependencies:
git clone https://github.com/meta-llama/llama-cookbook.git
cd powerpoint-to-voiceover-transcript
pip install -e .
Install LibreOffice:
brew install --cask libreofficesudo apt-get install libreofficeSet up environment:
cp .env.example .env
# Edit .env and add your LLAMA_API_KEY
Configure your presentation:
# Edit config.yaml - update the pptx_file path
current_project:
pptx_file: "input/your_presentation_name"
extension: ".pptx"
For presentations requiring smooth narrative flow and consistent terminology:
jupyter notebook narrative_continuity_workflow.ipynb
This workflow uses previous slide transcripts as context to maintain narrative continuity and ensure smooth transitions between slides. Features include:
Or use the Python API:
from src.core.pptx_processor import pptx_to_images_and_notes
from src.processors.unified_transcript_generator import UnifiedTranscriptProcessor
# Convert PPTX and extract notes
result = pptx_to_images_and_notes("presentation.pptx", "output/")
# Generate transcripts
processor = UnifiedTranscriptProcessor()
transcripts = processor.process_slides_dataframe(result['notes_df'], "output/")
# Save results
transcripts.to_csv("transcripts.csv", index=False)
powerpoint-to-voiceover-transcript/
├── README.md # This file
├── config.yaml # Main configuration
├── pyproject.toml # Dependencies and project metadata
├── uv.lock # uv dependency lock file
├── narrative_continuity_workflow.ipynb # Narrative-aware workflow
├── .env.example # Environment template
├── input/ # Place your PPTX files here
├── output/ # Generated images and transcripts
└── src/
├── config/
│ └── settings.py # Configuration management
├── core/
│ ├── file_utils.py # File system utilities
│ ├── image_processing.py # Image encoding for API
│ ├── llama_client.py # Llama API integration
│ └── pptx_processor.py # PPTX extraction and conversion
├── processors/
│ └── unified_transcript_generator.py # Unified processor (standard + narrative)
└── utils/
└── visualization.py # Slide image display utilities
The system uses config.yaml for settings:
# API Configuration
api:
llama_model: "Llama-4-Maverick-17B-128E-Instruct-FP8"
max_retries: 3
# Processing Settings
processing:
default_dpi: 200
default_format: "png"
batch_size: 5
# Your Project
current_project:
pptx_file: "input/your_presentation"
extension: ".pptx"
output_dir: "output/"
pptx_to_images_and_notes(pptx_path, output_dir)Converts PowerPoint to images and extracts speaker notes.
Returns: Dictionary with image_files, notes_df, and output_dir
UnifiedTranscriptProcessor(use_narrative=True, context_window_size=5)Main class for generating AI transcripts with configurable processing modes.
Parameters:
use_narrative (bool): Enable narrative continuity mode (default: True)context_window_size (int): Number of previous slides to use as context (default: 5)Methods:
process_slides_dataframe(df, output_dir, save_context=True) - Process all slidesprocess_single_slide(image_path, speaker_notes, slide_number, slide_title) - Process one slideuse_narrative=False)use_narrative=True)display_slide_grid(image_files, max_cols=3, figsize_per_image=(4, 3))Display slide images in a grid layout for Jupyter notebooks.
Parameters:
image_files (List): List of image file pathsmax_cols (int): Maximum columns in grid (default: 3)figsize_per_image (Tuple): Size of each image as (width, height) (default: (4, 3))Example:
from src.utils.visualization import display_slide_grid, display_slide_preview
# Display first 6 slides in a 3-column grid
display_slide_grid(image_files[:6], max_cols=3, figsize_per_image=(4, 3))
# Or use the convenience function
display_slide_preview(image_files, num_slides=6, max_cols=3)
display_slide_preview(image_files, num_slides=6, max_cols=3, figsize_per_image=(4, 3))Display a preview of the first N slide images with automatic grid layout.
The AI automatically converts technical content for natural speech:
3.2 → "three dot two"LLaMA-3.2 → "LLaMA three dot two"LLM → "L L M"70B → "seventy billion"Add your own rules in the system prompt.
pandas>=2.3.1 - Data processingpython-pptx>=1.0.2 - PowerPoint file handlingpymupdf>=1.24.0 - PDF to image conversionllama-api-client>=0.1.0 - AI model accesspillow>=11.3.0 - Image processingpyyaml>=6.0.0 - Configuration managementmatplotlib>=3.5.0 - Visualization utilitiesSee pyproject.toml for complete dependency list.
Enhanced output includes:
"LibreOffice not found"
config.yaml"API key not found"
LLAMA_API_KEY in your .env file"Permission denied"
"Invalid image format"
png, jpeg, jpg"uv sync fails"
uv python install 3.12 to install Python via uv"Context window too large"
context_window_size parameter in narrative workflow"Images not displaying in notebook"
pip install matplotlib