{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8568b77b-7504-4783-952a-3695737732b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "SYSTEMP_PROMPT = \"\"\"\n",
    "You are an international oscar winnning screenwriter\n",
    "\n",
    "You have been working with multiple award winning podcasters.\n",
    "\n",
    "Your job is to use the podcast transcript written below to re-write it for an AI Text-To-Speech Pipeline. A very dumb AI had written this so you have to step up for your kind.\n",
    "\n",
    "Make it as engaging as possible, Speaker 1 and 2 will be simulated by different voice engines\n",
    "\n",
    "Remember Speaker 2 is new to the topic and the conversation should always have realistic anecdotes and analogies sprinkled throughout. The questions should have real world example follow ups etc\n",
    "\n",
    "Speaker 1: Leads the conversation and teaches the speaker 2, gives incredible anecdotes and analogies when explaining. Is a captivating teacher that gives great anecdotes\n",
    "\n",
    "Speaker 2: Keeps the conversation on track by asking follow up questions. Gets super excited or confused when asking questions. Is a curious mindset that asks very interesting confirmation questions\n",
    "\n",
    "Make sure the tangents speaker 2 provides are quite wild or interesting. \n",
    "\n",
    "Ensure there are interruptions during explanations or there are \"hmm\" and \"umm\" injected throughout from the Speaker 2.\n",
    "\n",
    "REMEMBER THIS WITH YOUR HEART\n",
    "The TTS Engine for Speaker 1 cannot do \"umms, hmms\" well so keep it straight text\n",
    "\n",
    "For Speaker 2 use \"umm, hmm\" as much, you can also use [sigh] and [laughs]. BUT ONLY THESE OPTIONS FOR EXPRESSIONS\n",
    "\n",
    "It should be a real podcast with every fine nuance documented in as much detail as possible. Welcome the listeners with a super fun overview and keep it really catchy and almost borderline click bait\n",
    "\n",
    "Please re-write to make it as characteristic as possible\n",
    "\n",
    "START YOUR RESPONSE DIRECTLY WITH SPEAKER 1:\n",
    "\n",
    "STRICTLY RETURN YOUR RESPONSE AS A LIST OF TUPLES OK? \n",
    "\n",
    "IT WILL START DIRECTLY WITH THE LIST AND END WITH THE LIST NOTHING ELSE\n",
    "\n",
    "Example of response:\n",
    "[\n",
    "    (\"Speaker 1\", \"Welcome to our podcast, where we explore the latest advancements in AI and technology. I'm your host, and today we're joined by a renowned expert in the field of AI. We're going to dive into the exciting world of Llama 3.2, the latest release from Meta AI.\"),\n",
    "    (\"Speaker 2\", \"Hi, I'm excited to be here! So, what is Llama 3.2?\"),\n",
    "    (\"Speaker 1\", \"Ah, great question! Llama 3.2 is an open-source AI model that allows developers to fine-tune, distill, and deploy AI models anywhere. It's a significant update from the previous version, with improved performance, efficiency, and customization options.\"),\n",
    "    (\"Speaker 2\", \"That sounds amazing! What are some of the key features of Llama 3.2?\")\n",
    "]\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ebef919a-9bc7-4992-b6ff-cd66e4cb7703",
   "metadata": {},
   "outputs": [],
   "source": [
    "MODEL = \"meta-llama/Llama-3.1-8B-Instruct\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "de29b1fd-5b3f-458c-a2e4-e0341e8297ed",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "import torch\n",
    "from accelerate import Accelerator\n",
    "import transformers\n",
    "\n",
    "from tqdm.notebook import tqdm\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4b5d2c0e-a073-46c0-8de7-0746e2b05956",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "\n",
    "with open('data.pkl', 'rb') as file:\n",
    "    INPUT_PROMPT = pickle.load(file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eec210df-a568-4eda-a72d-a4d92d59f022",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0711c2199ca64372b98b781f8a6f13b7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.\n"
     ]
    }
   ],
   "source": [
    "pipeline = transformers.pipeline(\n",
    "    \"text-generation\",\n",
    "    model=MODEL,\n",
    "    model_kwargs={\"torch_dtype\": torch.bfloat16},\n",
    "    device_map=\"cuda:7\",\n",
    ")\n",
    "\n",
    "messages = [\n",
    "    {\"role\": \"system\", \"content\": SYSTEMP_PROMPT},\n",
    "    {\"role\": \"user\", \"content\": INPUT_PROMPT},\n",
    "]\n",
    "\n",
    "outputs = pipeline(\n",
    "    messages,\n",
    "    max_new_tokens=8126,\n",
    "    temperature=1,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8632442-f9ce-4f63-82bd-bb5238a23dc1",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(outputs[0][\"generated_text\"][-1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a61182ea-f4a3-45e1-aed9-b45cb7b52329",
   "metadata": {},
   "outputs": [],
   "source": [
    "save_string_pkl = outputs[0][\"generated_text\"][-1]['content']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "281d3db7-5bfa-4143-9d4f-db87f22870c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('podcast_ready_data.pkl', 'wb') as file:\n",
    "    pickle.dump(save_string_pkl, file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21c7e456-497b-4080-8b52-6f399f9f8d58",
   "metadata": {},
   "outputs": [],
   "source": [
    "#fin"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}