Sanyam Bhutani ae04cd3052 Update README.md 1 rok pred
..
resources 5ce0b098ce final_runs 1 rok pred
Bark-Testing.ipynb 0bd41ea488 Sweeps added 1 rok pred
Parler-Testing.ipynb 867d99e625 Create Parler-Testing.ipynb 1 rok pred
Prompt_testing.md eba932ba1f Update Prompt_testing.md 1 rok pred
README.md ae04cd3052 Update README.md 1 rok pred
Step-1 PDF-Pre-Processing-Logic.ipynb e84dc568db Polish out notebooks and worflow 1 rok pred
Step-2-Transcript-Writer.ipynb ca0221f279 Semi-Final-runs 1 rok pred
Step-3-Re-Writer.ipynb 5ce0b098ce final_runs 1 rok pred
Step-4-TTS-Workflow.ipynb 5ce0b098ce final_runs 1 rok pred

README.md

NotebookLlama: An Open Source version of NotebookLM

Author: Sanyam Bhutani

This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow.

Here is the outline:

  • Step 1: Pre-process PDF: Use Llama-3.2-1B to pre-process and save a PDF
  • Step 2: Transcript Writer: Use Llama-3.1-70B model to write a podcast transcript from the text
  • Step 3: Dramatic Re-Writer: Use Llama-3.1-8B model to make the transcript more dramatic
  • Step 4: Text-To-Speech Workflow: Use parler-tts/parler-tts-mini-v1 and bark/suno to generate a conversational podcast

Steps to running the notebook:

TODO

Next-Improvements/Further ideas:

  • Speech Model experimentation: The TTS model is the limitation of how natural this will sound. This probably be improved with a better pipeline
  • LLM vs LLM Debate: Another approach of writing the podcast would be having two agents debate the topic of interest and write the podcast outline. Right now we use a single LLM (70B) to write the podcast outline
  • Testing 405B for writing the transcripts
  • Better prompting
  • Support for ingesting a website, audio file, YouTube links and more. We welcome community PRs!

Scratch-pad/Running Notes:

So right now there is one issue: Parler needs transformers 4.43.3 or earlier and to generate you need latest, so I am just switching on fly

Actually this IS THE MOST CONSISTENT PROMPT: Small:

description = """
Laura's voice is expressive and dramatic in delivery, speaking at a fast pace with a very close recording that almost has no background noise.
"""

Large:

description = """
Alisa's voice is consistent, quite expressive and dramatic in delivery, with a very close recording that almost has no background noise.
"""

Small:

description = """
Jenna's voice is consistent, quite expressive and dramatic in delivery, with a very close recording that almost has no background noise.
"""

Bark is cool but just v6 works great, I tried v9 but its quite robotic and that is sad.

So Parler is next-its quite cool for prompting

xTTS-v2 by coquai is cool, however-need to check the license-I think an example is allowed

Torotoise is blocking because it needs HF version that doesnt work with llama-3.2 models so I will probably need to make a seperate env-need to eval if its worth it

Side note: The TTS library is a really cool effort!

Bark-Tests: Best results for speaker/v6 are at speech_output = model.generate(**inputs, temperature = 0.9, semantic_temperature = 0.8) Audio(speech_output[0].cpu().numpy(), rate=sampling_rate)

Tested sound effects:

  • Laugh is probably most effective
  • Sigh is hit or miss
  • Gasps doesn't work
  • A singly hypen is effective
  • Captilisation makes it louder

Ignore/Delete this in final stages, right now this is a "vibe-check" for TTS model(s):

Starting with: Bark but if it falls apart, here is the order

Vibe check:

Higher Barrier to testing (In other words-I was too lazy to test):

Try later:

Resources used for learning: