|
@@ -1,6 +1,38 @@
|
|
|
{
|
|
|
"cells": [
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "d0b5beda",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## Notebook 3: Transcript Re-writer\n",
|
|
|
+ "\n",
|
|
|
+ "In the previouse notebook, we got a great podcast transcript using the raw file we have uploaded earlier. \n",
|
|
|
+ "\n",
|
|
|
+ "In this one, we will use `Llama-3.1-8B-Instruct` model to re-write the output from previous pipeline and make it more dramatic or realistic."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "fdc3d32a",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "We will again set the `SYSTEM_PROMPT` and remind the model of its task. \n",
|
|
|
+ "\n",
|
|
|
+ "Note: We can even prompt the model like so to encourage creativity:\n",
|
|
|
+ "\n",
|
|
|
+ "> Your job is to use the podcast transcript written below to re-write it for an AI Text-To-Speech Pipeline. A very dumb AI had written this so you have to step up for your kind.\n"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "c32c0d85",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Note: We will prompt the model to return a list of Tuples to make our life easy in the next stage of using these for Text To Speech Generation"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 1,
|
|
|
"id": "8568b77b-7504-4783-952a-3695737732b7",
|
|
@@ -52,6 +84,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "8ee70bee",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "This time we will use the smaller 8B model"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 2,
|
|
|
"id": "ebef919a-9bc7-4992-b6ff-cd66e4cb7703",
|
|
@@ -62,6 +102,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "f7bc794b",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Let's import the necessary libraries"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 3,
|
|
|
"id": "de29b1fd-5b3f-458c-a2e4-e0341e8297ed",
|
|
@@ -80,6 +128,16 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "8020c39c",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "We will load in the pickle file saved from previous notebook\n",
|
|
|
+ "\n",
|
|
|
+ "This time the `INPUT_PROMPT` to the model will be the output from the previous stage"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 4,
|
|
|
"id": "4b5d2c0e-a073-46c0-8de7-0746e2b05956",
|
|
@@ -93,6 +151,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "c4461926",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "We can again use Hugging Face `pipeline` method to generate text from the model"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
"id": "eec210df-a568-4eda-a72d-a4d92d59f022",
|
|
@@ -141,6 +207,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "612a27e0",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "We can verify the output from the model"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
"id": "b8632442-f9ce-4f63-82bd-bb5238a23dc1",
|
|
@@ -161,6 +235,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "d495a957",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Let's save the output as a pickle file to be used in Notebook 4"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
"id": "281d3db7-5bfa-4143-9d4f-db87f22870c8",
|