radu/LLamaRecipes @ ae04cd3052ed8fb18192dc2439480f4cef179a66

Sanyam Bhutani ae04cd3052 Update README.md		11 månader sedan
..
resources	5ce0b098ce final_runs	1 år sedan
Bark-Testing.ipynb	0bd41ea488 Sweeps added	1 år sedan
Parler-Testing.ipynb	867d99e625 Create Parler-Testing.ipynb	1 år sedan
Prompt_testing.md	eba932ba1f Update Prompt_testing.md	1 år sedan
README.md	ae04cd3052 Update README.md	11 månader sedan
Step-1 PDF-Pre-Processing-Logic.ipynb	e84dc568db Polish out notebooks and worflow	1 år sedan
Step-2-Transcript-Writer.ipynb	ca0221f279 Semi-Final-runs	1 år sedan
Step-3-Re-Writer.ipynb	5ce0b098ce final_runs	1 år sedan
Step-4-TTS-Workflow.ipynb	5ce0b098ce final_runs	1 år sedan

NotebookLlama: An Open Source version of NotebookLM

Author: Sanyam Bhutani

This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow.

Here is the outline:

Step 1: Pre-process PDF: Use Llama-3.2-1B to pre-process and save a PDF
Step 2: Transcript Writer: Use Llama-3.1-70B model to write a podcast transcript from the text
Step 3: Dramatic Re-Writer: Use Llama-3.1-8B model to make the transcript more dramatic
Step 4: Text-To-Speech Workflow: Use parler-tts/parler-tts-mini-v1 and bark/suno to generate a conversational podcast

Steps to running the notebook:

TODO

Next-Improvements/Further ideas:

Speech Model experimentation: The TTS model is the limitation of how natural this will sound. This probably be improved with a better pipeline
LLM vs LLM Debate: Another approach of writing the podcast would be having two agents debate the topic of interest and write the podcast outline. Right now we use a single LLM (70B) to write the podcast outline
Testing 405B for writing the transcripts
Better prompting
Support for ingesting a website, audio file, YouTube links and more. We welcome community PRs!

Scratch-pad/Running Notes:

So right now there is one issue: Parler needs transformers 4.43.3 or earlier and to generate you need latest, so I am just switching on fly

Actually this IS THE MOST CONSISTENT PROMPT: Small:

description = """
Laura's voice is expressive and dramatic in delivery, speaking at a fast pace with a very close recording that almost has no background noise.
"""

Large:

description = """
Alisa's voice is consistent, quite expressive and dramatic in delivery, with a very close recording that almost has no background noise.
"""

Small:

description = """
Jenna's voice is consistent, quite expressive and dramatic in delivery, with a very close recording that almost has no background noise.
"""

Bark is cool but just v6 works great, I tried v9 but its quite robotic and that is sad.

So Parler is next-its quite cool for prompting

xTTS-v2 by coquai is cool, however-need to check the license-I think an example is allowed

Torotoise is blocking because it needs HF version that doesnt work with llama-3.2 models so I will probably need to make a seperate env-need to eval if its worth it

Side note: The TTS library is a really cool effort!

Bark-Tests: Best results for speaker/v6 are at speech_output = model.generate(**inputs, temperature = 0.9, semantic_temperature = 0.8) Audio(speech_output[0].cpu().numpy(), rate=sampling_rate)

Tested sound effects:

Laugh is probably most effective
Sigh is hit or miss
Gasps doesn't work
A singly hypen is effective
Captilisation makes it louder

Ignore/Delete this in final stages, right now this is a "vibe-check" for TTS model(s):

https://github.com/SWivid/F5-TTS: Latest and most popular-"feels robotic"
Reddit says E2 model from earlier is better

Starting with: Bark but if it falls apart, here is the order

Vibe check:

This is most popular (ever) on HF and features different accents-the samples feel a little robotic and no accent difference: https://huggingface.co/myshell-ai/MeloTTS-English
Seems to have great documentation but still a bit robotic for my liking: https://coqui.ai/blog/tts/open_xtts
Super easy with laughter etc but very slightly robotic: https://huggingface.co/suno/bark
This is THE MOST NATURAL SOUNDING: https://huggingface.co/WhisperSpeech/WhisperSpeech
This has a lot of promise, even though its robotic, we can use natural voice to add filters or effects: https://huggingface.co/spaces/parler-tts/parler_tts

Higher Barrier to testing (In other words-I was too lazy to test):

Try later:

Whisper Colab:
https://huggingface.co/parler-tts/parler-tts-large-v1
https://huggingface.co/myshell-ai/MeloTTS-English
Bark: https://huggingface.co/suno/bark (This has been insanely popular)
https://huggingface.co/facebook/mms-tts-eng
https://huggingface.co/fishaudio/fish-speech-1.4
https://huggingface.co/mlx-community/mlx_bark
https://huggingface.co/metavoiceio/metavoice-1B-v0.1
https://huggingface.co/suno/bark-small

README.md

NotebookLlama: An Open Source version of NotebookLM

Steps to running the notebook:

Next-Improvements/Further ideas:

Scratch-pad/Running Notes:

Resources used for learning: