Pārlūkot izejas kodu

added some notes

Sanyam Bhutani 6 mēneši atpakaļ
vecāks
revīzija
5b55693059

+ 25 - 14
recipes/quickstart/NotebookLlama/Step-1 PDF-Pre-Processing-Logic.ipynb

@@ -2,10 +2,33 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "id": "4f67a6a6",
+   "metadata": {},
+   "source": [
+    "## Notebook 1: PDF Pre-processing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
    "id": "f68aee84-04e3-4cbc-be78-6de9e06e704f",
    "metadata": {},
    "source": [
-    "Notebook for uploading PDF, extracting all Text and Pre-Processing using a 1B or 3B model"
+    "In the series, we will be going from a PDF to Podcast using all open models. \n",
+    "\n",
+    "The first step in getting to the podcast is finding a script, right now our logic is:\n",
+    "- Use any PDF on any topic\n",
+    "- Prompt `Llama-3.2-1B-Instruct` model to process it into a text file\n",
+    "- Re-write this into a podcast transcript in next notebook.\n",
+    "\n",
+    "In this notebook, we will upload a PDF and save it into a `.txt` file using the `PyPDF2` library, later we will process chunks from the text file using our featherlight model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61cb3584",
+   "metadata": {},
+   "source": [
+    "Most of us shift-enter pass the comments to realise later we need to install libraries. For the few that read the instructions, please remember to do so:"
    ]
   },
   {
@@ -27,19 +50,7 @@
    "outputs": [],
    "source": [
     "pdf_path = './2402.13116v3.pdf'\n",
-    "DEFAULT_MODEL = \"meta-llama/Llama-3.2-1B-Instruct\"\n",
-    "#DEFAULT_MODEL = \"meta-llama/Llama-3.2-1B-Instruct\" <- Don't think this would be necessary"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "id": "9418ac5e-df65-4c03-ac64-48a1275afa39",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from difflib import HtmlDiff\n",
-    "from IPython.display import HTML, display"
+    "DEFAULT_MODEL = \"meta-llama/Llama-3.2-1B-Instruct\""
    ]
   },
   {

+ 1 - 1
recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb

@@ -10,7 +10,7 @@
     "SYSTEMP_PROMPT = \"\"\"\n",
     "You are the a world-class podcast writer, you have worked as a ghost writer for Joe Rogan, Lex Fridman, Ben Shapiro, Tim Ferris. \n",
     "\n",
-    "Actually you were the one that scripted their entire shows.\n",
+    "We are in an alternate universe where actually you have been writing every line they say and they just stream it into their brains.\n",
     "\n",
     "You have won multiple podcast awards for your writing.\n",
     " \n",

+ 8 - 0
recipes/quickstart/NotebookLlama/Step-4-TTS-Workflow.ipynb

@@ -2,6 +2,14 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "id": "c31c0e37",
+   "metadata": {},
+   "source": [
+    "## Notebook 4: TTS Workflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
    "id": "be20fda2-409e-4d86-b502-33aee1a73151",
    "metadata": {},
    "source": [

+ 27 - 30
recipes/quickstart/NotebookLlama/TTS_Notes.md

@@ -1,16 +1,15 @@
 ### Notes from TTS Experimentation
 
-For the TTS Pipeline, *all* of the top models from HuggingFace and Reddit were tested. 
+For the TTS Pipeline, *all* of the top models from HuggingFace and Reddit were tried. 
 
-Tested how? 
+The goal was to use the models that were easy to setup and sounded less robotic with ability to include sound effects like laughter, etc.
 
-It was a simple vibe test of checking which sounds less robotic. Promoising directions to explore in future:
+#### Parler-TTS
 
-- [MeloTTS](huggingface.co/myshell-ai/MeloTTS-English) This is most popular (ever) on HuggingFace
-- [WhisperSpeech](https://huggingface.co/WhisperSpeech/WhisperSpeech) sounded quite natural as well
-- 
 
 
+Surprisingly, Parler's mini model sounded more natural. In their [repo]() they share names of speakers that we can use in prompt 
+
 Actually this IS THE MOST CONSISTENT PROMPT:
 Small:
 ```
@@ -32,15 +31,9 @@ Jenna's voice is consistent, quite expressive and dramatic in delivery, with a v
 """
 ```
 
-Bark is cool but just v6 works great, I tried v9 but its quite robotic and that is sad. 
-
-So Parler is next-its quite cool for prompting 
-
-xTTS-v2 by coquai is cool, however-need to check the license-I think an example is allowed
-
-Torotoise is blocking because it needs HF version that doesnt work with llama-3.2 models so I will probably need to make a seperate env-need to eval if its worth it
+#### Suno/Bark
 
-Side note: The TTS library is a really cool effort!
+Bark is cool but just v6 works great, I tried v9 but its quite robotic and that is sad. 
 
 Bark-Tests: Best results for speaker/v6 are at ```speech_output = model.generate(**inputs, temperature = 0.9, semantic_temperature = 0.8)
 Audio(speech_output[0].cpu().numpy(), rate=sampling_rate)```
@@ -53,23 +46,27 @@ Tested sound effects:
 - A singly hypen is effective
 - Captilisation makes it louder
 
-Vibe check: 
-- 
-- Seems to have great documentation but still a bit robotic for my liking: https://coqui.ai/blog/tts/open_xtts
 
-- This is THE MOST NATURAL SOUNDING: 
-- This has a lot of promise, even though its robotic, we can use natural voice to add filters or effects: https://huggingface.co/spaces/parler-tts/parler_tts
+### Notes from other models that were tested:
+
+Promising directions to explore in future:
+
+- [MeloTTS](huggingface.co/myshell-ai/MeloTTS-English) This is most popular (ever) on HuggingFace
+- [WhisperSpeech](https://huggingface.co/WhisperSpeech/WhisperSpeech) sounded quite natural as well
+- [F5-TTS](https://github.com/SWivid/F5-TTS) was the latest release at this time, however, it felt a bit robotic
+- E2-TTS: r/locallama claims this to be a little better, however, it didn't pass the vibe test
+- [xTTS](https://coqui.ai/blog/tts/open_xtts) It has great documentation and also seems promising
+
+
 
-Higher Barrier to testing (In other words-I was too lazy to test):
-- https://huggingface.co/fishaudio/fish-speech-1.4
-- https://huggingface.co/facebook/mms-tts-eng
-- https://huggingface.co/metavoiceio/metavoice-1B-v0.1
-- https://huggingface.co/nvidia/tts_hifigan
-- https://huggingface.co/speechbrain/tts-tacotron2-ljspeech
+#### Some more models that weren't tested:
 
+In other words, we leave this as an excercise to readers :D
 
-Try later:
-- Whisper Colab: 
-- https://huggingface.co/facebook/mms-tts-eng
-- https://huggingface.co/fishaudio/fish-speech-1.4
-- https://huggingface.co/metavoiceio/metavoice-1B-v0.1
+- [Fish-Speech](https://huggingface.co/fishaudio/fish-speech-1.4)
+- [MMS-TTS-Eng](https://huggingface.co/facebook/mms-tts-eng)
+- [Metavoice](https://huggingface.co/metavoiceio/metavoice-1B-v0.1)
+- [Hifigan](https://huggingface.co/nvidia/tts_hifigan)
+- [TTS-Tacotron2](https://huggingface.co/speechbrain/tts-tacotron2-ljspeech) 
+- [MMS-TTS-Eng](https://huggingface.co/facebook/mms-tts-eng)
+- [VALL-E X](https://github.com/Plachtaa/VALL-E-X)