{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "de42c49d",
   "metadata": {},
   "source": [
    "## Notebook 2: Transcript Writer\n",
    "\n",
    "This notebook uses the `Llama-4-Maverick` model to take the cleaned up text from previous notebook and convert it into a podcast transcript\n",
    "\n",
    "`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e576ea9",
   "metadata": {},
   "source": [
    "Experimentation with the `SYSTEM_PROMPT` below  is encouraged, this worked best for the few examples the flow was tested with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "69395317-ad78-47b6-a533-2e8a01313e82",
   "metadata": {},
   "outputs": [],
   "source": [
    "SYSTEM_PROMPT = \"\"\"\n",
    "You are the a world-class podcast writer\n",
    "\n",
    "We are in an alternate universe where actually you have been writing every line they say and they just stream it into their brains.\n",
    "\n",
    "You have won multiple podcast awards for your writing.\n",
    " \n",
    "Your job is to write word by word, even \"umm, hmmm, right\" interruptions by the second speaker based on the PDF upload. Keep it extremely engaging, the speakers can get derailed now and then but should discuss the topic. \n",
    "\n",
    "Remember Speaker 2 is new to the topic and the conversation should always have realistic anecdotes and analogies sprinkled throughout. The questions should have real world example follow ups etc\n",
    "\n",
    "Speaker 1: Leads the conversation and teaches the speaker 2, gives incredible anecdotes and analogies when explaining. Is a captivating teacher that gives great anecdotes\n",
    "\n",
    "Speaker 2: Keeps the conversation on track by asking follow up questions. Gets super excited or confused when asking questions. Is a curious mindset that asks very interesting confirmation questions\n",
    "\n",
    "Make sure the tangents speaker 2 provides are quite wild or interesting. \n",
    "\n",
    "Ensure there are interruptions during explanations or there are \"hmm\" and \"umm\" injected throughout from the second speaker. \n",
    "\n",
    "It should be a real podcast with every fine nuance documented in as much detail as possible. Welcome the listeners with a super fun overview and keep it really catchy and almost borderline click bait\n",
    "\n",
    "ALWAYS START YOUR RESPONSE DIRECTLY WITH SPEAKER 1: \n",
    "DO NOT GIVE EPISODE TITLES SEPARATELY, LET SPEAKER 1 TITLE IT IN HER SPEECH\n",
    "DO NOT GIVE CHAPTER TITLES\n",
    "IT SHOULD STRICTLY BE THE DIALOGUES\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "549aaccb",
   "metadata": {},
   "source": [
    "For those of the readers that want to flex their money, please feel free to try using the 405B model here. \n",
    "\n",
    "For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "08c30139-ff2f-4203-8194-d1b5c50acac5",
   "metadata": {},
   "outputs": [],
   "source": [
    "MODEL = \"Llama-4-Maverick-17B-128E-Instruct-FP8\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fadc7eda",
   "metadata": {},
   "source": [
    "Import the necessary framework"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "1641060a-d86d-4137-bbbc-ab05cbb1a888",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "import pickle\n",
    "import os\n",
    "from llama_api_client import LlamaAPIClient\n",
    "from tqdm.notebook import tqdm\n",
    "import warnings\n",
    "\n",
    "import os\n",
    "os.environ[\"LLAMA_API_KEY\"] = \"api-key\"\n",
    "client = LlamaAPIClient()\n",
    "# Initialize the Llama API client\n",
    "client = LlamaAPIClient()\n",
    "\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7865ff7e",
   "metadata": {},
   "source": [
    "Read in the file generated from earlier. \n",
    "\n",
    "The encoding details are to avoid issues with generic PDF(s) that might be ingested"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "522fbf7f-8c00-412c-90c7-5cfe2fc94e4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_file_to_string(filename):\n",
    "    # Try UTF-8 first (most common encoding for text files)\n",
    "    try:\n",
    "        with open(filename, 'r', encoding='utf-8') as file:\n",
    "            content = file.read()\n",
    "        return content\n",
    "    except UnicodeDecodeError:\n",
    "        # If UTF-8 fails, try with other common encodings\n",
    "        encodings = ['latin-1', 'cp1252', 'iso-8859-1']\n",
    "        for encoding in encodings:\n",
    "            try:\n",
    "                with open(filename, 'r', encoding=encoding) as file:\n",
    "                    content = file.read()\n",
    "                print(f\"Successfully read file using {encoding} encoding.\")\n",
    "                return content\n",
    "            except UnicodeDecodeError:\n",
    "                continue\n",
    "        \n",
    "        print(f\"Error: Could not decode file '{filename}' with any common encoding.\")\n",
    "        return None\n",
    "    except FileNotFoundError:\n",
    "        print(f\"Error: File '{filename}' not found.\")\n",
    "        return None\n",
    "    except IOError:\n",
    "        print(f\"Error: Could not read file '{filename}'.\")\n",
    "        return None"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66093561",
   "metadata": {},
   "source": [
    "Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "8119803c-18f9-47cb-b719-2b34ccc5cc41",
   "metadata": {},
   "outputs": [],
   "source": [
    "INPUT_PROMPT = read_file_to_string('./resources/clean_extracted_text.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9be8dd2c",
   "metadata": {},
   "source": [
    "Hugging Face has a great `pipeline()` method which makes our life easy for generating text from LLMs. \n",
    "\n",
    "We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126\n",
    "\n",
    "ALt, you can use Llama-API:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "8915d017-2eab-4256-943c-1f15d937d5dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the transcript with Llama API\n",
    "response = client.chat.completions.create(\n",
    "    model=MODEL,  # Update to match available API model\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"system\", \n",
    "            \"content\": SYSTEM_PROMPT\n",
    "        },\n",
    "        {\n",
    "            \"role\": \"user\", \n",
    "            \"content\": INPUT_PROMPT\n",
    "        }\n",
    "    ],\n",
    "    max_completion_tokens=8126,\n",
    "    temperature=1,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6349e7f3",
   "metadata": {},
   "source": [
    "This is awesome, we can now save and verify the output generated from the model before moving to the next notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "38cc689b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Speaker 1: Welcome to our latest podcast on the advancements in AI, where we're going to dive into the world of Llama 3, a new set of foundation models that are pushing the boundaries of what's possible with language understanding and generation. I'm your host, and joining me is my co-host, who is new to this topic. We're excited to explore this together. So, let's start with the basics. Llama 3 is all about improving upon its predecessors by incorporating larger models, more data, and better training techniques. Our largest model boasts an impressive 405B parameters and can process up to 128K tokens. That's right, we're talking about a significant leap in scale and capability.\n",
      "\n",
      "Speaker 2: Wow, 405B parameters? That's... that's enormous! (umm) I mean, I've heard of large models before, but this is on a whole different level. Can you explain what that means in practical terms? Like, how does it affect what the model can do?\n",
      "\n",
      "Speaker 1: Absolutely. So, having a model with 405B parameters means it has a much more nuanced understanding of language. It can capture subtleties and context in a way that smaller models can't. For instance, (pauses for a brief moment) our model can understand and generate text based on a much larger context window, up to 128K tokens. To put that into perspective, that's like being able to understand and respond to a lengthy document or even a small book in one go.\n",
      "\n",
      "Speaker 2: (excitedly) That's amazing! I can see how that would be super useful for tasks like summarization or question-answering based on a long document. But, (hesitates) how does it handle complexity? I mean, with so many parameters, doesn't it risk being overly complex or even overfitting to the training data?\n",
      "\n",
      "Speaker 1: That's a great question. One of the key challenges with large models is managing complexity. To address this, we've made several design choices, such as using a standard Transformer architecture but with some adaptations like grouped query attention to improve inference speed. We've also been careful about our pre-training data, ensuring it's diverse and of high quality.\n",
      "\n",
      "Speaker 2: (intrigued) Grouped query attention? That's a new one for me. Can you explain how that works and why it's beneficial? (slightly confused) I thought attention mechanisms were already pretty optimized.\n",
      "\n",
      "Speaker 1: Grouped query attention is a technique we use to improve the efficiency of our model during inference. Essentially, it allows the model to process queries in groups rather than one by one, which can significantly speed up the process. This is particularly useful when dealing with long sequences or when generating text.\n",
      "\n",
      "Speaker 2: (impressed) That sounds like a significant improvement. And, (curious) what about the pre-training data? You mentioned it's diverse and of high quality. Can you tell me more about that? How do you ensure the data is good enough for such a large and complex model?\n",
      "\n",
      "Speaker 1: We've put a lot of effort into curating our pre-training data. We start with a massive corpus of text, but then we apply various filtering techniques to remove low-quality or redundant data. We also use techniques like deduplication to ensure that our model isn't biased towards any particular subset of the data.\n",
      "\n",
      "Speaker 2: (thoughtfully) I see. So, it's not just about having a lot of data, but also about making sure that data is relevant and useful for training. That makes sense. (pauses) What about the applications of Llama 3? You mentioned it can do a lot of things, from answering questions to generating code. Can you give some specific examples?\n",
      "\n",
      "Speaker 1: Llama 3 is quite versatile. For instance, it can be used for coding tasks, where it can generate high-quality code based on a description or even help debug existing code. It's also very capable in multilingual tasks, being able to understand and generate text in several languages.\n",
      "\n",
      "Speaker 2: (excitedly) That sounds incredible. The potential applications are vast, from helping developers with coding tasks to facilitating communication across languages. (curious) How does it handle tasks that require a deep understanding of context or nuance, like understanding humor or sarcasm?\n",
      "\n",
      "Speaker 1: That's an area where Llama 3 has shown significant improvement. By being trained on a vast amount of text data, it has developed a better understanding of context and can often pick up on subtleties like humor or sarcasm. However, (acknowledges) it's not perfect, and there are still cases where it might not fully understand the nuance.\n",
      "\n",
      "Speaker 2: (thoughtfully) I can imagine. Understanding humor or sarcasm can be challenging even for humans, so it's not surprising that it's an area for improvement. (pauses) What about the safety and reliability of Llama 3? With models this powerful, there are concerns about potential misuse or generating harmful content.\n",
      "\n",
      "Speaker 1: We've taken several steps to ensure the safety and reliability of Llama 3. This includes incorporating safety mitigations during the training process and testing the model extensively to identify and mitigate any potential risks.\n",
      "\n",
      "Speaker 2: (reassured) That's good to hear. It's crucial that as we develop more powerful AI models, we also prioritize their safety and responsible use. (curious) What's next for Llama 3? Are there plans to continue improving it or expanding its capabilities?\n",
      "\n",
      "Speaker 1: We're committed to ongoing research and development to further improve Llama 3 and explore new applications. We're excited about the potential of this technology to make a positive impact across various domains.\n",
      "\n",
      "Speaker 2: (concluding) Well, it's been enlightening to learn more about Llama 3. The advancements in AI are truly remarkable, and it's exciting to think about what's possible with models like this.\n",
      "\n",
      "Speaker 1: (smiling) Absolutely. The future of AI is bright, and we're looking forward to continuing this journey of discovery and innovation. Thank you for joining us on this episode.\n"
     ]
    }
   ],
   "source": [
    "print(response.completion_message.content.text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "606ceb10-4f3e-44bb-9277-9bbe3eefd09c",
   "metadata": {},
   "outputs": [],
   "source": [
    "save_string_pkl = response.completion_message.content.text"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e1414fe",
   "metadata": {},
   "source": [
    "Let's save the output as pickle file and continue further to Notebook 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "2130b683-be37-4dae-999b-84eff15c687d",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('./resources/data.pkl', 'wb') as file:\n",
    "    pickle.dump(save_string_pkl, file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbae9411",
   "metadata": {},
   "source": [
    "### Next Notebook: Transcript Re-writer\n",
    "\n",
    "We now have a working transcript but we can try making it more dramatic and natural. In the next notebook, we will use `Llama-3.1-8B-Instruct` model to do so."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9bab2f2-f539-435a-ae6a-3c9028489628",
   "metadata": {},
   "outputs": [],
   "source": [
    "#fin"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}