|
@@ -1,6 +1,26 @@
|
|
|
{
|
|
|
"cells": [
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "de42c49d",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## Notebook 2: Transcript Writer\n",
|
|
|
+ "\n",
|
|
|
+ "This notebook uses the `Llama-3.1-70B-Instruct` model to take the cleaned up text from previous notebook and convert it into a podcast transcript\n",
|
|
|
+ "\n",
|
|
|
+ "`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "2e576ea9",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Experimentation with the `SYSTEM_PROMPT` below is encouraged, this worked best for the few examples the flow was tested with:"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 1,
|
|
|
"id": "69395317-ad78-47b6-a533-2e8a01313e82",
|
|
@@ -36,6 +56,16 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "549aaccb",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "For those of the readers that want to flex their money, please feel free to try using the 405B model here. \n",
|
|
|
+ "\n",
|
|
|
+ "For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 2,
|
|
|
"id": "08c30139-ff2f-4203-8194-d1b5c50acac5",
|
|
@@ -46,6 +76,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "fadc7eda",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Import the necessary framework"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 3,
|
|
|
"id": "1641060a-d86d-4137-bbbc-ab05cbb1a888",
|
|
@@ -65,6 +103,16 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "7865ff7e",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Read in the file generated from earlier. \n",
|
|
|
+ "\n",
|
|
|
+ "The encoding details are to avoid issues with generic PDF(s) that might be ingested"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 4,
|
|
|
"id": "522fbf7f-8c00-412c-90c7-5cfe2fc94e4c",
|
|
@@ -100,6 +148,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "66093561",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 5,
|
|
|
"id": "8119803c-18f9-47cb-b719-2b34ccc5cc41",
|
|
@@ -110,6 +166,16 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "9be8dd2c",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Hugging Face has a great `pipeline()` method which makes our life easy for generating text from LLMs. \n",
|
|
|
+ "\n",
|
|
|
+ "We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 6,
|
|
|
"id": "8915d017-2eab-4256-943c-1f15d937d5dc",
|
|
@@ -159,6 +225,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "6349e7f3",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "This is awesome, we can now save and verify the output generated from the model before moving to the next notebook"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 7,
|
|
|
"id": "606ceb10-4f3e-44bb-9277-9bbe3eefd09c",
|
|
@@ -210,6 +284,14 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "1e1414fe",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Let's save the output as pickle file and continue further to Notebook 3"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 8,
|
|
|
"id": "2130b683-be37-4dae-999b-84eff15c687d",
|
|
@@ -226,7 +308,9 @@
|
|
|
"id": "d9bab2f2-f539-435a-ae6a-3c9028489628",
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
- "source": []
|
|
|
+ "source": [
|
|
|
+ "#fin"
|
|
|
+ ]
|
|
|
}
|
|
|
],
|
|
|
"metadata": {
|