|
7 місяців тому | |
---|---|---|
.. | ||
resources | 7 місяців тому | |
README.md | 7 місяців тому | |
Step-1 PDF-Pre-Processing-Logic.ipynb | 7 місяців тому | |
Step-2-Transcript-Writer.ipynb | 7 місяців тому | |
Step-3-Re-Writer.ipynb | 7 місяців тому | |
Step-4-TTS-Workflow.ipynb | 7 місяців тому | |
TTS_Notes.md | 7 місяців тому |
This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow.
You will also learn from my experimentation of using Text to Speech Models.
It assumes zero knowledge of LLMs, prompting and audio models, everything is covered in their respective notebooks.
Here is step by step thought (pun intended) for the task:
Llama-3.2-1B
to pre-process and save a PDFLlama-3.1-70B
model to write a podcast transcript from the textLlama-3.1-8B
model to make the transcript more dramaticparler-tts/parler-tts-mini-v1
and bark/suno
to generate a conversational podcastRequirements: GPU server or an API provider for using 70B, 8B and 1B Llama models.
Note: For our GPU Poor friends, you can also use the 8B and lower models for the entire pipeline. There is no strong recommendation. The pipeline below is what worked best on first few tests. You should try and see what works best for you!
git clone https://github.com/meta-llama/llama-recipes
cd llama-recipes/recipes/quickstart/NotebookLlama/
pip install -r requirements.txt
This notebook is used for processing the PDF and processing it using the new Featherlight model into a .txt
file.
Update the first cell with a PDF link that you would like to use. Please decide on a PDF to use for Notebook 1, it can be any link but please remember to update the first cell of the notebook with the right link.
Please try changing the prompts for the Llama-3.2-1B-Instruct
model and see if you can improve results.
This notebook will take in the processed output from Notebook 1 and creatively convert it into a podcast transcript using the Llama-3.1-70B-Instruct
model. If you are GPU or even generally rich, please feel free to test with the 405B model!
Please try experimenting with the System prompts for the model and see if you can improve the results and try the 8B model as well here to see if there is a huge difference!
This notebook takes the transcript from earlier and prompts Llama-3.1-8B-Instruct
to add more dramatisation and interruptions in the conversations.
There is also a key factor here: we return a tuple of conversation which makes our lives easier later. Yes, studying Data Structures 101 was actually useful for once!
For our TTS logic, we use two different models that behave differently with certain prompts. So we prompt the model to add specifics for each speaker accordingly.
Please again try changing the system prompt and see if you can imporve the results. We encourage testing the featherlight 3B and 1B models as well at this stage
Finally, we take the results from last notebook and convert them into a podcast. We use the parler-tts/parler-tts-mini-v1
and bark/suno
models for a conversation.
The speakers and the prompt for parler model were decided based on experimentation and suggestions from the model authors. Please try experimentating, you can find more details in the resources section.