|
@@ -7,19 +7,26 @@
|
|
|
"source": [
|
|
"source": [
|
|
|
"# PowerPoint to Narrative-Aware Voiceover Transcript Generator\n",
|
|
"# PowerPoint to Narrative-Aware Voiceover Transcript Generator\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
- "This notebook demonstrates the complete workflow for converting PowerPoint presentations into AI-generated voiceover transcripts with narrative continuity using Llama 4 Maverick through the Llama API.\n",
|
|
|
|
|
|
|
+ "This notebook demonstrates the complete workflow for converting PowerPoint presentations into AI-generated voiceover transcripts using the unified transcript processor with Llama 4 Maverick through the Llama API.\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"## Overview\n",
|
|
"## Overview\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
- "This enhanced workflow performs the following operations:\n",
|
|
|
|
|
|
|
+ "This unified workflow performs the following operations:\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"1. **Content Extraction**: Pulls speaker notes and visual elements from PowerPoint slides\n",
|
|
"1. **Content Extraction**: Pulls speaker notes and visual elements from PowerPoint slides\n",
|
|
|
"2. **Image Conversion**: Transforms slides into high-quality images for AI analysis\n",
|
|
"2. **Image Conversion**: Transforms slides into high-quality images for AI analysis\n",
|
|
|
- "3. **Narrative-Aware Processing**: Uses previous slide transcripts as context for continuity\n",
|
|
|
|
|
- "4. **Transcript Generation**: Creates natural-sounding voiceover content with smooth transitions\n",
|
|
|
|
|
|
|
+ "3. **Flexible Processing**: Choose between standard or narrative-aware processing modes\n",
|
|
|
|
|
+ "4. **Transcript Generation**: Creates natural-sounding voiceover content with optional narrative continuity\n",
|
|
|
"5. **Speech Optimization**: Converts numbers, technical terms, and abbreviations to spoken form\n",
|
|
"5. **Speech Optimization**: Converts numbers, technical terms, and abbreviations to spoken form\n",
|
|
|
"6. **Results Export**: Saves transcripts and context information in multiple formats\n",
|
|
"6. **Results Export**: Saves transcripts and context information in multiple formats\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
|
|
+ "## Key Features\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "- **Unified Processor**: Single class handles both standard and narrative-aware processing\n",
|
|
|
|
|
+ "- **Configurable Context**: Adjustable context window for narrative continuity\n",
|
|
|
|
|
+ "- **Mode Selection**: Toggle between standard and narrative processing with a simple flag\n",
|
|
|
|
|
+ "- **Backward Compatibility**: Maintains compatibility with existing workflows\n",
|
|
|
|
|
+ "\n",
|
|
|
"## Prerequisites\n",
|
|
"## Prerequisites\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"Before running this notebook, ensure you have:\n",
|
|
"Before running this notebook, ensure you have:\n",
|
|
@@ -40,10 +47,19 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "execution_count": 19,
|
|
|
"id": "21a962b2",
|
|
"id": "21a962b2",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "SUCCESS: Environment loaded successfully!\n",
|
|
|
|
|
+ "SUCCESS: Llama API key found\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
"source": [
|
|
"source": [
|
|
|
"# Import required libraries\n",
|
|
"# Import required libraries\n",
|
|
|
"import pandas as pd\n",
|
|
"import pandas as pd\n",
|
|
@@ -67,23 +83,35 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "execution_count": 20,
|
|
|
"id": "71c1c8bd",
|
|
"id": "71c1c8bd",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "SUCCESS: All modules imported successfully!\n",
|
|
|
|
|
+ "- PPTX processor ready\n",
|
|
|
|
|
+ "- Unified transcript generator ready\n",
|
|
|
|
|
+ "- Configuration manager ready\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
"source": [
|
|
"source": [
|
|
|
"# Import custom modules\n",
|
|
"# Import custom modules\n",
|
|
|
"try:\n",
|
|
"try:\n",
|
|
|
" from src.core.pptx_processor import extract_pptx_notes, pptx_to_images_and_notes\n",
|
|
" from src.core.pptx_processor import extract_pptx_notes, pptx_to_images_and_notes\n",
|
|
|
- " from src.processors.narrative_transcript_generator import (\n",
|
|
|
|
|
- " NarrativeTranscriptProcessor,\n",
|
|
|
|
|
|
|
+ " from src.processors.unified_transcript_generator import (\n",
|
|
|
|
|
+ " UnifiedTranscriptProcessor,\n",
|
|
|
|
|
+ " process_slides,\n",
|
|
|
" process_slides_with_narrative\n",
|
|
" process_slides_with_narrative\n",
|
|
|
" )\n",
|
|
" )\n",
|
|
|
" from src.config.settings import load_config, get_config\n",
|
|
" from src.config.settings import load_config, get_config\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
" print(\"SUCCESS: All modules imported successfully!\")\n",
|
|
" print(\"SUCCESS: All modules imported successfully!\")\n",
|
|
|
" print(\"- PPTX processor ready\")\n",
|
|
" print(\"- PPTX processor ready\")\n",
|
|
|
- " print(\"- Narrative transcript generator ready\")\n",
|
|
|
|
|
|
|
+ " print(\"- Unified transcript generator ready\")\n",
|
|
|
" print(\"- Configuration manager ready\")\n",
|
|
" print(\"- Configuration manager ready\")\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"except ImportError as e:\n",
|
|
"except ImportError as e:\n",
|
|
@@ -93,10 +121,24 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "execution_count": 21,
|
|
|
"id": "53781172",
|
|
"id": "53781172",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "SUCCESS: Configuration loaded successfully!\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Current Settings:\n",
|
|
|
|
|
+ "- Llama Model: Llama-4-Maverick-17B-128E-Instruct-FP8\n",
|
|
|
|
|
+ "- Image DPI: 200\n",
|
|
|
|
|
+ "- Image Format: png\n",
|
|
|
|
|
+ "- Context Window: 5 previous slides (default)\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
"source": [
|
|
"source": [
|
|
|
"# Load and display configuration\n",
|
|
"# Load and display configuration\n",
|
|
|
"config = load_config()\n",
|
|
"config = load_config()\n",
|
|
@@ -110,10 +152,22 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "execution_count": 22,
|
|
|
"id": "9386e035",
|
|
"id": "9386e035",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "File Configuration:\n",
|
|
|
|
|
+ "- Input File: input/Part 1 Llama Certification V2 PPT Outline.pptx\n",
|
|
|
|
|
+ "- Output Directory: output/\n",
|
|
|
|
|
+ "- SUCCESS: Input file found (80.4 MB)\n",
|
|
|
|
|
+ "- SUCCESS: Output directory ready\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
"source": [
|
|
"source": [
|
|
|
"# Configure file paths from config.yaml\n",
|
|
"# Configure file paths from config.yaml\n",
|
|
|
"pptx_file = config['current_project']['pptx_file'] + config['current_project']['extension']\n",
|
|
"pptx_file = config['current_project']['pptx_file'] + config['current_project']['extension']\n",
|
|
@@ -138,6 +192,53 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
|
|
|
+ "id": "35a9e13a-4f85-488e-880b-62c7512d1248",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "## Processing Mode Configuration\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Choose your processing mode and configure the unified processor."
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 25,
|
|
|
|
|
+ "id": "6fbfcb28-2f09-4497-8098-35cf3d62ebf3",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "Processing Mode Configuration:\n",
|
|
|
|
|
+ "- Mode: NARRATIVE CONTINUITY\n",
|
|
|
|
|
+ "- Context Window: 5 previous slides\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "# Configure processing mode\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "USE_NARRATIVE = True # Set to False for standard processing, True for narrative continuity\n",
|
|
|
|
|
+ "CONTEXT_WINDOW_SIZE = 5 # Number of previous slides to use as context (only used when USE_NARRATIVE=True)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "print(\"Processing Mode Configuration:\")\n",
|
|
|
|
|
+ "if USE_NARRATIVE:\n",
|
|
|
|
|
+ " print(f\"- Mode: NARRATIVE CONTINUITY\")\n",
|
|
|
|
|
+ " print(f\"- Context Window: {CONTEXT_WINDOW_SIZE} previous slides\")\n",
|
|
|
|
|
+ "else:\n",
|
|
|
|
|
+ " print(f\"- Mode: STANDARD PROCESSING\")\n",
|
|
|
|
|
+ " print(f\"- Features: Independent slide processing, faster execution\")\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "# Initialize the unified processor\n",
|
|
|
|
|
+ "processor = UnifiedTranscriptProcessor(\n",
|
|
|
|
|
+ " use_narrative=USE_NARRATIVE,\n",
|
|
|
|
|
+ " context_window_size=CONTEXT_WINDOW_SIZE\n",
|
|
|
|
|
+ ")"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
"id": "ea4851e6",
|
|
"id": "ea4851e6",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
"source": [
|
|
"source": [
|
|
@@ -159,10 +260,131 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "execution_count": 26,
|
|
|
"id": "644ee94c",
|
|
"id": "644ee94c",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "PROCESSING: Converting PPTX to images and extracting notes...\n",
|
|
|
|
|
+ "Processing: Part 1 Llama Certification V2 PPT Outline.pptx\n",
|
|
|
|
|
+ "Extracting speaker notes...\n",
|
|
|
|
|
+ "Found notes on 41 of 102 slides\n",
|
|
|
|
|
+ "Notes df saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output/Part 1 Llama Certification V2 PPT Outline_notes.csv\n",
|
|
|
|
|
+ "Converting to PDF...\n",
|
|
|
|
|
+ "Converting to PNG images at 200 DPI...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Successfully processed 102 slides\n",
|
|
|
|
|
+ "Images saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "SUCCESS: Processing completed successfully!\n",
|
|
|
|
|
+ "- Processed 102 slides\n",
|
|
|
|
|
+ "- Images saved to: /Users/yucedincer/Desktop/Projects/llama-cookbook/end-to-end-use-cases/powerpoint-to-voiceover-transcript/output\n",
|
|
|
|
|
+ "- Found notes on 41 slides\n",
|
|
|
|
|
+ "- DataFrame shape: (102, 8)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Sample Data (First 5 slides):\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "data": {
|
|
|
|
|
+ "text/html": [
|
|
|
|
|
+ "<div>\n",
|
|
|
|
|
+ "<style scoped>\n",
|
|
|
|
|
+ " .dataframe tbody tr th:only-of-type {\n",
|
|
|
|
|
+ " vertical-align: middle;\n",
|
|
|
|
|
+ " }\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ " .dataframe tbody tr th {\n",
|
|
|
|
|
+ " vertical-align: top;\n",
|
|
|
|
|
+ " }\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ " .dataframe thead th {\n",
|
|
|
|
|
+ " text-align: right;\n",
|
|
|
|
|
+ " }\n",
|
|
|
|
|
+ "</style>\n",
|
|
|
|
|
+ "<table border=\"1\" class=\"dataframe\">\n",
|
|
|
|
|
+ " <thead>\n",
|
|
|
|
|
+ " <tr style=\"text-align: right;\">\n",
|
|
|
|
|
+ " <th></th>\n",
|
|
|
|
|
+ " <th>slide_number</th>\n",
|
|
|
|
|
+ " <th>slide_title</th>\n",
|
|
|
|
|
+ " <th>has_notes</th>\n",
|
|
|
|
|
+ " <th>notes_word_count</th>\n",
|
|
|
|
|
+ " <th>slide_text_word_count</th>\n",
|
|
|
|
|
+ " </tr>\n",
|
|
|
|
|
+ " </thead>\n",
|
|
|
|
|
+ " <tbody>\n",
|
|
|
|
|
+ " <tr>\n",
|
|
|
|
|
+ " <th>0</th>\n",
|
|
|
|
|
+ " <td>1</td>\n",
|
|
|
|
|
+ " <td>APRIL 2025</td>\n",
|
|
|
|
|
+ " <td>False</td>\n",
|
|
|
|
|
+ " <td>0</td>\n",
|
|
|
|
|
+ " <td>6</td>\n",
|
|
|
|
|
+ " </tr>\n",
|
|
|
|
|
+ " <tr>\n",
|
|
|
|
|
+ " <th>1</th>\n",
|
|
|
|
|
+ " <td>2</td>\n",
|
|
|
|
|
+ " <td>Module 1: \u000b",
|
|
|
|
|
+ "Introduction to Llama</td>\n",
|
|
|
|
|
+ " <td>False</td>\n",
|
|
|
|
|
+ " <td>0</td>\n",
|
|
|
|
|
+ " <td>41</td>\n",
|
|
|
|
|
+ " </tr>\n",
|
|
|
|
|
+ " <tr>\n",
|
|
|
|
|
+ " <th>2</th>\n",
|
|
|
|
|
+ " <td>3</td>\n",
|
|
|
|
|
+ " <td>Video 1: Overview of Llama</td>\n",
|
|
|
|
|
+ " <td>False</td>\n",
|
|
|
|
|
+ " <td>0</td>\n",
|
|
|
|
|
+ " <td>15</td>\n",
|
|
|
|
|
+ " </tr>\n",
|
|
|
|
|
+ " <tr>\n",
|
|
|
|
|
+ " <th>3</th>\n",
|
|
|
|
|
+ " <td>4</td>\n",
|
|
|
|
|
+ " <td>Artificial intelligence (AI)</td>\n",
|
|
|
|
|
+ " <td>True</td>\n",
|
|
|
|
|
+ " <td>243</td>\n",
|
|
|
|
|
+ " <td>34</td>\n",
|
|
|
|
|
+ " </tr>\n",
|
|
|
|
|
+ " <tr>\n",
|
|
|
|
|
+ " <th>4</th>\n",
|
|
|
|
|
+ " <td>5</td>\n",
|
|
|
|
|
+ " <td>Leverage Llama for unparalleled \u000b",
|
|
|
|
|
+ "control, cust...</td>\n",
|
|
|
|
|
+ " <td>True</td>\n",
|
|
|
|
|
+ " <td>244</td>\n",
|
|
|
|
|
+ " <td>147</td>\n",
|
|
|
|
|
+ " </tr>\n",
|
|
|
|
|
+ " </tbody>\n",
|
|
|
|
|
+ "</table>\n",
|
|
|
|
|
+ "</div>"
|
|
|
|
|
+ ],
|
|
|
|
|
+ "text/plain": [
|
|
|
|
|
+ " slide_number slide_title has_notes \\\n",
|
|
|
|
|
+ "0 1 APRIL 2025 False \n",
|
|
|
|
|
+ "1 2 Module 1: \n",
|
|
|
|
|
+ "Introduction to Llama False \n",
|
|
|
|
|
+ "2 3 Video 1: Overview of Llama False \n",
|
|
|
|
|
+ "3 4 Artificial intelligence (AI) True \n",
|
|
|
|
|
+ "4 5 Leverage Llama for unparalleled \n",
|
|
|
|
|
+ "control, cust... True \n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ " notes_word_count slide_text_word_count \n",
|
|
|
|
|
+ "0 0 6 \n",
|
|
|
|
|
+ "1 0 41 \n",
|
|
|
|
|
+ "2 0 15 \n",
|
|
|
|
|
+ "3 243 34 \n",
|
|
|
|
|
+ "4 244 147 "
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "output_type": "display_data"
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
"source": [
|
|
"source": [
|
|
|
"print(\"PROCESSING: Converting PPTX to images and extracting notes...\")\n",
|
|
"print(\"PROCESSING: Converting PPTX to images and extracting notes...\")\n",
|
|
|
"\n",
|
|
"\n",
|
|
@@ -206,31 +428,79 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "execution_count": 27,
|
|
|
"id": "fe564b99",
|
|
"id": "fe564b99",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "PROCESSING: Starting AI transcript generation with unified processor...\n",
|
|
|
|
|
+ "- Processing 102 slides\n",
|
|
|
|
|
+ "- Using model: Llama-4-Maverick-17B-128E-Instruct-FP8\n",
|
|
|
|
|
+ "- Mode: Narrative Continuity\n",
|
|
|
|
|
+ "- Context window: 5 previous slides\n",
|
|
|
|
|
+ "- Using previous transcripts as context for narrative continuity\n",
|
|
|
|
|
+ "- This may take several minutes...\n",
|
|
|
|
|
+ "Processing 102 slides with narrative continuity...\n",
|
|
|
|
|
+ "Using context window of 5 previous slides\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stderr",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "Processing slides: 100%|██████████████████████| 102/102 [06:03<00:00, 3.56s/it]"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "Context information saved to: output/narrative_context\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "SUCCESS: Transcript generation completed!\n",
|
|
|
|
|
+ "- Generated 102 transcripts\n",
|
|
|
|
|
+ "- Average length: 1137 characters\n",
|
|
|
|
|
+ "- Total words: 16,950\n",
|
|
|
|
|
+ "- Context information saved to: output/narrative_context/\n",
|
|
|
|
|
+ "- Average context slides used: 4.9\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stderr",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
"source": [
|
|
"source": [
|
|
|
- "print(\"PROCESSING: Starting narrative-aware AI transcript generation...\")\n",
|
|
|
|
|
|
|
+ "print(\"PROCESSING: Starting AI transcript generation with unified processor...\")\n",
|
|
|
"print(f\"- Processing {len(notes_df)} slides\")\n",
|
|
"print(f\"- Processing {len(notes_df)} slides\")\n",
|
|
|
"print(f\"- Using model: {config['api']['llama_model']}\")\n",
|
|
"print(f\"- Using model: {config['api']['llama_model']}\")\n",
|
|
|
- "print(f\"- Context window: 5 previous slides\")\n",
|
|
|
|
|
- "print(f\"- Using previous transcripts as context for narrative continuity\")\n",
|
|
|
|
|
|
|
+ "print(f\"- Mode: {'Narrative Continuity' if USE_NARRATIVE else 'Standard Processing'}\")\n",
|
|
|
|
|
+ "if USE_NARRATIVE:\n",
|
|
|
|
|
+ " print(f\"- Context window: {CONTEXT_WINDOW_SIZE} previous slides\")\n",
|
|
|
|
|
+ " print(f\"- Using previous transcripts as context for narrative continuity\")\n",
|
|
|
"print(\"- This may take several minutes...\")\n",
|
|
"print(\"- This may take several minutes...\")\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
- "# Initialize processor and generate transcripts with narrative continuity\n",
|
|
|
|
|
- "processor = NarrativeTranscriptProcessor(context_window_size=5)\n",
|
|
|
|
|
- "processed_df = processor.process_slides_dataframe_with_narrative(\n",
|
|
|
|
|
|
|
+ "# Generate transcripts using the unified processor\n",
|
|
|
|
|
+ "processed_df = processor.process_slides_dataframe(\n",
|
|
|
" df=notes_df,\n",
|
|
" df=notes_df,\n",
|
|
|
" output_dir=output_dir,\n",
|
|
" output_dir=output_dir,\n",
|
|
|
- " save_context=True\n",
|
|
|
|
|
|
|
+ " save_context=True # Only saves context if USE_NARRATIVE=True\n",
|
|
|
")\n",
|
|
")\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
- "print(f\"\\nSUCCESS: Narrative-aware transcript generation completed!\")\n",
|
|
|
|
|
|
|
+ "print(f\"\\nSUCCESS: Transcript generation completed!\")\n",
|
|
|
"print(f\"- Generated {len(processed_df)} transcripts\")\n",
|
|
"print(f\"- Generated {len(processed_df)} transcripts\")\n",
|
|
|
"print(f\"- Average length: {processed_df['ai_transcript'].str.len().mean():.0f} characters\")\n",
|
|
"print(f\"- Average length: {processed_df['ai_transcript'].str.len().mean():.0f} characters\")\n",
|
|
|
"print(f\"- Total words: {processed_df['ai_transcript'].str.split().str.len().sum():,}\")\n",
|
|
"print(f\"- Total words: {processed_df['ai_transcript'].str.split().str.len().sum():,}\")\n",
|
|
|
- "print(f\"- Context information saved to: {output_dir}narrative_context/\")"
|
|
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "if USE_NARRATIVE:\n",
|
|
|
|
|
+ " print(f\"- Context information saved to: {output_dir}narrative_context/\")\n",
|
|
|
|
|
+ " print(f\"- Average context slides used: {processed_df['context_slides_used'].mean():.1f}\")"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
@@ -245,29 +515,56 @@
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "execution_count": 28,
|
|
|
"id": "8463ac3a",
|
|
"id": "8463ac3a",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "PROCESSING: Saving results in multiple formats...\n",
|
|
|
|
|
+ "- SUCCESS: Complete results saved to output/narrative_transcripts.csv\n",
|
|
|
|
|
+ "- SUCCESS: Clean transcripts saved to output/narrative_transcripts_clean.csv\n",
|
|
|
|
|
+ "- SUCCESS: JSON format saved to output/narrative_transcripts.json\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Export Summary:\n",
|
|
|
|
|
+ "- Processing mode: Narrative Continuity\n",
|
|
|
|
|
+ "- Total slides processed: 102\n",
|
|
|
|
|
+ "- Slides with speaker notes: 41\n",
|
|
|
|
|
+ "- Total transcript words: 16,950\n",
|
|
|
|
|
+ "- Average transcript length: 1137 characters\n",
|
|
|
|
|
+ "- Estimated reading time: 113.0 minutes\n",
|
|
|
|
|
+ "- Average context slides per slide: 4.9\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
"source": [
|
|
"source": [
|
|
|
"print(\"PROCESSING: Saving results in multiple formats...\")\n",
|
|
"print(\"PROCESSING: Saving results in multiple formats...\")\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"# Create output directory\n",
|
|
"# Create output directory\n",
|
|
|
"os.makedirs(output_dir, exist_ok=True)\n",
|
|
"os.makedirs(output_dir, exist_ok=True)\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
|
|
+ "# Determine file prefix based on processing mode\n",
|
|
|
|
|
+ "file_prefix = \"narrative\" if USE_NARRATIVE else \"standard\"\n",
|
|
|
|
|
+ "\n",
|
|
|
"# Save complete results with all metadata\n",
|
|
"# Save complete results with all metadata\n",
|
|
|
- "output_file = f\"{output_dir}narrative_transcripts.csv\"\n",
|
|
|
|
|
|
|
+ "output_file = f\"{output_dir}{file_prefix}_transcripts.csv\"\n",
|
|
|
"processed_df.to_csv(output_file, index=False)\n",
|
|
"processed_df.to_csv(output_file, index=False)\n",
|
|
|
"print(f\"- SUCCESS: Complete results saved to {output_file}\")\n",
|
|
"print(f\"- SUCCESS: Complete results saved to {output_file}\")\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"# Save transcript-only version for voiceover work\n",
|
|
"# Save transcript-only version for voiceover work\n",
|
|
|
- "transcript_only = processed_df[['slide_number', 'slide_title', 'ai_transcript', 'context_slides_used']]\n",
|
|
|
|
|
- "transcript_file = f\"{output_dir}narrative_transcripts_clean.csv\"\n",
|
|
|
|
|
|
|
+ "if USE_NARRATIVE:\n",
|
|
|
|
|
+ " transcript_only = processed_df[['slide_number', 'slide_title', 'ai_transcript', 'context_slides_used']]\n",
|
|
|
|
|
+ "else:\n",
|
|
|
|
|
+ " transcript_only = processed_df[['slide_number', 'slide_title', 'ai_transcript']]\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "transcript_file = f\"{output_dir}{file_prefix}_transcripts_clean.csv\"\n",
|
|
|
"transcript_only.to_csv(transcript_file, index=False)\n",
|
|
"transcript_only.to_csv(transcript_file, index=False)\n",
|
|
|
"print(f\"- SUCCESS: Clean transcripts saved to {transcript_file}\")\n",
|
|
"print(f\"- SUCCESS: Clean transcripts saved to {transcript_file}\")\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"# Save as JSON for API integration\n",
|
|
"# Save as JSON for API integration\n",
|
|
|
- "json_file = f\"{output_dir}narrative_transcripts.json\"\n",
|
|
|
|
|
|
|
+ "json_file = f\"{output_dir}{file_prefix}_transcripts.json\"\n",
|
|
|
"processed_df.to_json(json_file, orient='records', indent=2)\n",
|
|
"processed_df.to_json(json_file, orient='records', indent=2)\n",
|
|
|
"print(f\"- SUCCESS: JSON format saved to {json_file}\")\n",
|
|
"print(f\"- SUCCESS: JSON format saved to {json_file}\")\n",
|
|
|
"\n",
|
|
"\n",
|
|
@@ -276,12 +573,15 @@
|
|
|
"reading_time = total_words / 150 # Assuming 150 words per minute\n",
|
|
"reading_time = total_words / 150 # Assuming 150 words per minute\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"print(f\"\\nExport Summary:\")\n",
|
|
"print(f\"\\nExport Summary:\")\n",
|
|
|
|
|
+ "print(f\"- Processing mode: {'Narrative Continuity' if USE_NARRATIVE else 'Standard Processing'}\")\n",
|
|
|
"print(f\"- Total slides processed: {len(processed_df)}\")\n",
|
|
"print(f\"- Total slides processed: {len(processed_df)}\")\n",
|
|
|
"print(f\"- Slides with speaker notes: {processed_df['has_notes'].sum()}\")\n",
|
|
"print(f\"- Slides with speaker notes: {processed_df['has_notes'].sum()}\")\n",
|
|
|
"print(f\"- Total transcript words: {total_words:,}\")\n",
|
|
"print(f\"- Total transcript words: {total_words:,}\")\n",
|
|
|
"print(f\"- Average transcript length: {processed_df['ai_transcript'].str.len().mean():.0f} characters\")\n",
|
|
"print(f\"- Average transcript length: {processed_df['ai_transcript'].str.len().mean():.0f} characters\")\n",
|
|
|
"print(f\"- Estimated reading time: {reading_time:.1f} minutes\")\n",
|
|
"print(f\"- Estimated reading time: {reading_time:.1f} minutes\")\n",
|
|
|
- "print(f\"- Average context slides per slide: {processed_df['context_slides_used'].mean():.1f}\")"
|
|
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "if USE_NARRATIVE and 'context_slides_used' in processed_df.columns:\n",
|
|
|
|
|
+ " print(f\"- Average context slides per slide: {processed_df['context_slides_used'].mean():.1f}\")"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
@@ -293,31 +593,44 @@
|
|
|
"# Completion Summary\n",
|
|
"# Completion Summary\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"## Successfully Generated:\n",
|
|
"## Successfully Generated:\n",
|
|
|
- "- **Narrative-Aware Transcripts**: Context-aware voiceover content with smooth transitions\n",
|
|
|
|
|
- "- **Consistent Terminology**: Maintained terminology consistency throughout presentation\n",
|
|
|
|
|
|
|
+ "- **Unified Processing**: Single processor handles both standard and narrative modes\n",
|
|
|
|
|
+ "- **Flexible Configuration**: Easy switching between processing modes\n",
|
|
|
|
|
+ "- **Speech-Optimized Transcripts**: Natural-sounding voiceover content\n",
|
|
|
"- **Multiple Formats**: CSV, JSON exports for different use cases\n",
|
|
"- **Multiple Formats**: CSV, JSON exports for different use cases\n",
|
|
|
- "- **Context Analysis**: Detailed information about narrative flow and relationships\n",
|
|
|
|
|
|
|
+ "- **Context Analysis**: Detailed information about narrative flow (when enabled)\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"## Output Files:\n",
|
|
"## Output Files:\n",
|
|
|
- "- `narrative_transcripts.csv` - Complete dataset with context information\n",
|
|
|
|
|
- "- `narrative_transcripts_clean.csv` - Clean transcripts for voiceover work\n",
|
|
|
|
|
- "- `narrative_transcripts.json` - JSON format for API integration\n",
|
|
|
|
|
- "- `narrative_context/slide_contexts.json` - Individual slide context data\n",
|
|
|
|
|
- "- `narrative_context/narrative_summary.json` - Overall narrative analysis\n",
|
|
|
|
|
|
|
+ "- `[mode]_transcripts.csv` - Complete dataset with metadata\n",
|
|
|
|
|
+ "- `[mode]_transcripts_clean.csv` - Clean transcripts for voiceover work\n",
|
|
|
|
|
+ "- `[mode]_transcripts.json` - JSON format for API integration\n",
|
|
|
|
|
+ "- `narrative_context/` - Context analysis files (narrative mode only)\n",
|
|
|
"- Individual slide images in PNG/JPEG format\n",
|
|
"- Individual slide images in PNG/JPEG format\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
|
|
+ "## Processing Modes:\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "### Standard Mode (`USE_NARRATIVE = False`)\n",
|
|
|
|
|
+ "- **Best for**: Simple presentations, quick processing, independent slides\n",
|
|
|
|
|
+ "- **Features**: Fast execution, no context dependencies\n",
|
|
|
|
|
+ "- **Use cases**: Training materials, product demos, standalone slides\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "### Narrative Mode (`USE_NARRATIVE = True`)\n",
|
|
|
|
|
+ "- **Best for**: Story-driven presentations, complex topics, educational content\n",
|
|
|
|
|
+ "- **Features**: Context awareness, smooth transitions, terminology consistency\n",
|
|
|
|
|
+ "- **Use cases**: Conference talks, educational courses, marketing presentations\n",
|
|
|
|
|
+ "\n",
|
|
|
"## Next Steps:\n",
|
|
"## Next Steps:\n",
|
|
|
- "1. **Review** generated transcripts for narrative flow and accuracy\n",
|
|
|
|
|
|
|
+ "1. **Review** generated transcripts for accuracy and flow\n",
|
|
|
"2. **Edit** any content that needs refinement\n",
|
|
"2. **Edit** any content that needs refinement\n",
|
|
|
"3. **Create** voiceover recordings or use TTS systems\n",
|
|
"3. **Create** voiceover recordings or use TTS systems\n",
|
|
|
"4. **Integrate** JSON data into your video production workflow\n",
|
|
"4. **Integrate** JSON data into your video production workflow\n",
|
|
|
|
|
+ "5. **Experiment** with different processing modes for optimal results\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"## Tips for Better Results:\n",
|
|
"## Tips for Better Results:\n",
|
|
|
- "- **Rich Speaker Notes**: Slides with detailed notes generate better contextual transcripts\n",
|
|
|
|
|
|
|
+ "- **Rich Speaker Notes**: Detailed notes improve transcript quality in both modes\n",
|
|
|
"- **Clear Visuals**: High-contrast slides with readable text work best\n",
|
|
"- **Clear Visuals**: High-contrast slides with readable text work best\n",
|
|
|
- "- **Consistent Style**: Maintain consistent formatting across your presentation\n",
|
|
|
|
|
|
|
+ "- **Mode Selection**: Use narrative mode for complex presentations, standard for simple ones\n",
|
|
|
"- **Context Window**: Adjust context window size (3-7 slides) based on presentation complexity\n",
|
|
"- **Context Window**: Adjust context window size (3-7 slides) based on presentation complexity\n",
|
|
|
- "- **Review Context**: Check the narrative_context files to understand how continuity was maintained\n",
|
|
|
|
|
|
|
+ "- **Consistent Style**: Maintain consistent formatting across your presentation\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"---"
|
|
"---"
|
|
|
]
|
|
]
|