|
hai 11 meses | |
---|---|---|
.. | ||
resources | hai 1 ano | |
Bark-Testing.ipynb | hai 1 ano | |
Parler-Testing.ipynb | hai 1 ano | |
Prompt_testing.md | hai 1 ano | |
README.md | hai 11 meses | |
Step-1 PDF-Pre-Processing-Logic.ipynb | hai 1 ano | |
Step-2-Transcript-Writer.ipynb | hai 1 ano | |
Step-3-Re-Writer.ipynb | hai 1 ano | |
Step-4-TTS-Workflow.ipynb | hai 1 ano |
Author: Sanyam Bhutani
This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow.
Here is the outline:
Llama-3.2-1B
to pre-process and save a PDFLlama-3.1-70B
model to write a podcast transcript from the textLlama-3.1-8B
model to make the transcript more dramaticparler-tts/parler-tts-mini-v1
and bark/suno
to generate a conversational podcastTODO
So right now there is one issue: Parler needs transformers 4.43.3 or earlier and to generate you need latest, so I am just switching on fly
Actually this IS THE MOST CONSISTENT PROMPT: Small:
description = """
Laura's voice is expressive and dramatic in delivery, speaking at a fast pace with a very close recording that almost has no background noise.
"""
Large:
description = """
Alisa's voice is consistent, quite expressive and dramatic in delivery, with a very close recording that almost has no background noise.
"""
Small:
description = """
Jenna's voice is consistent, quite expressive and dramatic in delivery, with a very close recording that almost has no background noise.
"""
Bark is cool but just v6 works great, I tried v9 but its quite robotic and that is sad.
So Parler is next-its quite cool for prompting
xTTS-v2 by coquai is cool, however-need to check the license-I think an example is allowed
Torotoise is blocking because it needs HF version that doesnt work with llama-3.2 models so I will probably need to make a seperate env-need to eval if its worth it
Side note: The TTS library is a really cool effort!
Bark-Tests: Best results for speaker/v6 are at speech_output = model.generate(**inputs, temperature = 0.9, semantic_temperature = 0.8)
Audio(speech_output[0].cpu().numpy(), rate=sampling_rate)
Tested sound effects:
Ignore/Delete this in final stages, right now this is a "vibe-check" for TTS model(s):
Starting with: Bark but if it falls apart, here is the order
Vibe check:
Higher Barrier to testing (In other words-I was too lazy to test):
Try later: