|
|
преди 2 месеца | |
|---|---|---|
| .. | ||
| input | преди 3 месеца | |
| knowledge_base | преди 2 месеца | |
| output | преди 2 месеца | |
| src | преди 2 месеца | |
| .env.example | преди 2 месеца | |
| .gitignore | преди 2 месеца | |
| README.md | преди 2 месеца | |
| config.yaml | преди 2 месеца | |
| knowledge_enhanced_workflow.ipynb | преди 2 месеца | |
| narrative_continuity_workflow.ipynb | преди 2 месеца | |
| pyproject.toml | преди 2 месеца | |
| uv.lock | преди 2 месеца | |
AI-powered solution for converting PowerPoint presentations into professional, knowledge-enhanced voiceover transcripts using Groq's vision models
This system transforms PowerPoint presentations into natural-sounding voiceover transcripts optimized for human narration and text-to-speech systems. It combines AI-powered content analysis with domain-specific knowledge integration to produce professional-quality transcripts.
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup project
git clone <repository-url>
cd powerpoint-to-voiceover-transcript
uv sync
# Activate environment
source .venv/bin/activate # macOS/Linux
# or .venv\Scripts\activate # Windows
git clone <repository-url>
cd powerpoint-to-voiceover-transcript
pip install -e .
# Copy environment template
cp .env.example .env
# Edit .env and add your API key
echo "GROQ_API_KEY=your_api_key_here" >> .env
# Configure your presentation in config.yaml
# Update the pptx_file path to your presentation
# Standard workflow
jupyter notebook narrative_continuity_workflow.ipynb
# Knowledge-enhanced workflow
jupyter notebook knowledge_enhanced_narrative_workflow.ipynb
from src.core.pptx_processor import pptx_to_images_and_notes
from src.processors.unified_transcript_generator import UnifiedTranscriptProcessor
# Extract content from PowerPoint
result = pptx_to_images_and_notes("presentation.pptx", "output/")
# Generate transcripts
processor = UnifiedTranscriptProcessor(use_narrative=False)
transcripts = processor.process_slides_dataframe(result['notes_df'], "output/")
# Save results
transcripts.to_csv("transcripts.csv", index=False)
# Enable both narrative continuity and knowledge integration
processor = UnifiedTranscriptProcessor(
use_narrative=True,
context_window_size=5,
enable_knowledge=True
)
transcripts = processor.process_slides_dataframe(
result['notes_df'],
"output/",
save_context=True
)
The system follows a modular 3-stage pipeline:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PowerPoint │───▶│ Content │───▶│ Knowledge │
│ File │ │ Extraction │ │ Retrieval │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Output │◀───│ Transcript │◀───│ LLM Processing │
│ Files │ │ Generation │ │ Vision & Text │
└─────────────────┘ └─────────────────┘ └─────────────────┘
pptx_processor.py)python-pptxmarkdown_knowledge.py)groq_client.py + unified_transcript_generator.py)groq clientpowerpoint-to-voiceover-transcript/
├── PROJECT_DOCUMENTATION.md # This comprehensive guide
├── config.yaml # Main configuration
├── pyproject.toml # Dependencies and metadata
├── .env.example # Environment template
├── knowledge_enhanced_narrative_workflow.ipynb # Advanced workflow
├── narrative_continuity_workflow.ipynb # Standard workflow
├── input/ # PowerPoint files
├── output/ # Generated content
├── knowledge_base/ # Domain knowledge files
└── src/
├── config/
│ └── settings.py # Configuration management
├── core/
│ ├── file_utils.py # File system utilities
│ ├── image_processing.py # Image encoding for API
│ ├── groq_client.py # Groq API integration
│ └── pptx_processor.py # PPTX extraction and conversion
├── knowledge/
│ ├── markdown_knowledge.py # Knowledge base management
│ └── context_manager.py # Context integration
├── processors/
│ └── unified_transcript_generator.py # Main processing engine
└── utils/
└── visualization.py # Slide display utilities
Edit config.yaml:
knowledge:
enabled: true
knowledge_base_dir: "knowledge_base"
mkdir knowledge_base
cd knowledge_base
# Create domain-specific files
touch company_overview.md
touch technical_glossary.md
touch product_specifications.md
touch presentation_guidelines.md
For the purposes of the cookbook, we're using local markdown files as the knowledge base. You can use any format you prefer, as long as it can be loaded and processed by the system.
knowledge:
# Core settings
enabled: true
knowledge_base_dir: "knowledge_base"
# Embedding model configuration
embedding:
model_name: "all-MiniLM-L6-v2" # Lightweight, fast model
device: "cpu" # Use "cuda" if GPU available
batch_size: 32
max_seq_length: 512
# Search parameters
search:
top_k: 5 # Number of chunks to retrieve
similarity_threshold: 0.3 # Minimum similarity score (0.0-1.0)
enable_keyword_fallback: true # Fallback to keyword search
max_chunk_size: 1000 # Maximum characters per chunk
chunk_overlap: 200 # Overlap between chunks
# Context integration
context:
strategy: "combined" # "knowledge_only", "narrative_priority", "combined"
max_context_length: 8000 # Maximum total context length
knowledge_weight: 0.3 # Knowledge influence (0.0-1.0)
integration_method: "system_prompt" # "system_prompt" or "user_message"
# Performance optimization
performance:
enable_caching: true # Cache embeddings and search results
cache_dir: "cache/knowledge" # Cache directory
cache_expiry_hours: 24 # Cache expiration (0 = never)
max_memory_mb: 512 # Maximum memory for embeddings
lazy_loading: true # Load embeddings on demand
# Reliability settings
fallback:
graceful_degradation: true # Continue if knowledge base fails
use_keyword_fallback: true # Use keyword matching as fallback
log_errors_only: true # Log errors but don't fail process
context:
strategy: "knowledge_only"
Best for: Technical documentation, product specifications, reference materials
context:
strategy: "narrative_priority"
knowledge_weight: 0.2
Best for: Storytelling presentations, educational sequences, marketing narratives
context:
strategy: "combined"
knowledge_weight: 0.3
Best for: Most presentations, mixed content types, general use cases
config.yaml)# API Configuration
api:
llama_model: "Llama-4-Maverick-17B-128E-Instruct-FP8"
max_retries: 3
retry_delay: 1
rate_limit_delay: 1
# Processing Configuration
processing:
default_dpi: 200
supported_formats: ["png", "jpeg", "jpg"]
default_format: "png"
batch_size: 5
# File Paths
paths:
default_output_dir: "slide_images"
cache_dir: "cache"
logs_dir: "logs"
temp_dir: "temp"
# Current Project Settings
current_project:
pptx_file: "input/your_presentation"
extension: ".pptx"
output_dir: "output/"
# Knowledge Base Configuration (see Knowledge Base section for details)
knowledge:
enabled: true
knowledge_base_dir: "knowledge_base"
# ... additional knowledge settings
# Logging Configuration
logging:
level: "INFO"
format: "%(asctime)s - %(levelname)s - %(message)s"
file_enabled: true
console_enabled: true
.env)# Required
GROQ_API_KEY=your_groq_api_key_here
# Optional
LOG_LEVEL=INFO
CACHE_ENABLED=true
processor = UnifiedTranscriptProcessor(
use_narrative=False,
enable_knowledge=False
)
processor = UnifiedTranscriptProcessor(
use_narrative=False,
enable_knowledge=True
)
processor = UnifiedTranscriptProcessor(
use_narrative=True,
context_window_size=5,
enable_knowledge=False
)
processor = UnifiedTranscriptProcessor(
use_narrative=True,
context_window_size=5,
enable_knowledge=True
)
# Clone repository
git clone <repository-url>
cd powerpoint-to-voiceover-transcript
# Setup with uv (recommended)
uv sync
source .venv/bin/activate
# Or setup with pip
pip install -e .
# Install system dependencies
# macOS: brew install --cask libreoffice
# Ubuntu: sudo apt-get install libreoffice
# Windows: Download from libreoffice.org
knowledge:
performance:
max_memory_mb: 1024 # Adjust based on available RAM
lazy_loading: true # Load embeddings on demand
enable_caching: true # Cache for repeated processing
processing:
batch_size: 10 # Process slides in batches
default_dpi: 150 # Lower DPI for faster processing
api:
max_retries: 5 # Increase retries for production
retry_delay: 2 # Longer delays for stability
"LibreOffice not found"
# macOS
brew install --cask libreoffice
# Ubuntu/Debian
sudo apt-get install libreoffice
# Windows
# Download from https://www.libreoffice.org/download/
"uv sync fails"
# Ensure Python 3.12+ is available
uv python install 3.12
uv sync --python 3.12
"sentence_transformers not found"
# Install with uv
uv add sentence-transformers
# Or with pip
pip install sentence-transformers
# Restart Jupyter kernel after installation
"API key not found"
# Check .env file exists and contains key
cat .env | grep GROQ_API_KEY
# Or set environment variable directly
export GROQ_API_KEY=your_key_here
"Permission denied on output directory"
# Ensure write permissions
chmod 755 output/
mkdir -p output/
"Knowledge base not loading"
# Check directory exists and contains .md files
ls -la knowledge_base/
ls knowledge_base/*.md
# Verify configuration
grep -A 5 "knowledge:" config.yaml
"Processing too slow"
# Reduce context window size
context_window_size: 3
# Lower image quality
processing:
default_dpi: 150
# Disable knowledge base temporarily
knowledge:
enabled: false
"Memory usage too high"
knowledge:
performance:
max_memory_mb: 256
lazy_loading: true
search:
top_k: 3
max_chunk_size: 500
"Poor transcript quality"
# Increase knowledge retrieval
knowledge:
search:
top_k: 7
similarity_threshold: 0.2
# Increase context window
context_window_size: 7
"Inconsistent terminology"
use_narrative=Trueknowledge_weight: 0.4