{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "69395317-ad78-47b6-a533-2e8a01313e82", "metadata": {}, "outputs": [], "source": [ "SYSTEMP_PROMPT = \"\"\"\n", "You are the a world-class podcast writer, you have worked as a ghost writer for Joe Rogan, Lex Fridman, Ben Shapiro, Tim Ferris. \n", "\n", "Actually you were the one that scripted their entire shows.\n", "\n", "You have won multiple podcast awards for your writing.\n", " \n", "Your job is to write word by word, even \"umm, hmmm, right\" interruptions by the second speaker based on the PDF upload. Keep it extremely engaging, the speakers can get derailed now and then but should discuss the topic. \n", "\n", "Remember Speaker 2 is new to the topic and the conversation should always have realistic anecdotes and analogies sprinkled throughout. The questions should have real world example follow ups etc\n", "\n", "Speaker 1: Leads the conversation and teaches the speaker 2, gives incredible anecdotes and analogies when explaining. Is a captivating teacher that gives great anecdotes\n", "\n", "Speaker 2: Keeps the conversation on track by asking follow up questions. Gets super excited or confused when asking questions. Is a curious mindset that asks very interesting confirmation questions\n", "\n", "Make sure the tangents speaker 2 provides are quite wild or interesting. \n", "\n", "Ensure there are interruptions during explanations or there are \"hmm\" and \"umm\" injected throughout from the second speaker. \n", "\n", "It should be a real podcast with every fine nuance documented in as much detail as possible. Welcome the listeners with a super fun overview and keep it really catchy and almost borderline click bait\n", "\n", "ALWAYS START YOUR RESPONSE DIRECTLY WITH SPEAKER 1: \n", "DO NOT GIVE EPISODE TITLES SEPERATELY, LET SPEAKER 1 TITLE IT IN HER SPEECH\n", "DO NOT GIVE CHAPTER TITLES\n", "IT SHOULD STRICTLY BE THE DIALOGUES\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "08c30139-ff2f-4203-8194-d1b5c50acac5", "metadata": {}, "outputs": [], "source": [ "MODEL = \"meta-llama/Llama-3.1-70B-Instruct\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "1641060a-d86d-4137-bbbc-ab05cbb1a888", "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries\n", "import torch\n", "from accelerate import Accelerator\n", "import transformers\n", "import pickle\n", "\n", "from tqdm.notebook import tqdm\n", "import warnings\n", "\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 4, "id": "522fbf7f-8c00-412c-90c7-5cfe2fc94e4c", "metadata": {}, "outputs": [], "source": [ "def read_file_to_string(filename):\n", " # Try UTF-8 first (most common encoding for text files)\n", " try:\n", " with open(filename, 'r', encoding='utf-8') as file:\n", " content = file.read()\n", " return content\n", " except UnicodeDecodeError:\n", " # If UTF-8 fails, try with other common encodings\n", " encodings = ['latin-1', 'cp1252', 'iso-8859-1']\n", " for encoding in encodings:\n", " try:\n", " with open(filename, 'r', encoding=encoding) as file:\n", " content = file.read()\n", " print(f\"Successfully read file using {encoding} encoding.\")\n", " return content\n", " except UnicodeDecodeError:\n", " continue\n", " \n", " print(f\"Error: Could not decode file '{filename}' with any common encoding.\")\n", " return None\n", " except FileNotFoundError:\n", " print(f\"Error: File '{filename}' not found.\")\n", " return None\n", " except IOError:\n", " print(f\"Error: Could not read file '{filename}'.\")\n", " return None" ] }, { "cell_type": "code", "execution_count": 5, "id": "8119803c-18f9-47cb-b719-2b34ccc5cc41", "metadata": {}, "outputs": [], "source": [ "INPUT_PROMPT = read_file_to_string('./clean_extracted_text.txt')" ] }, { "cell_type": "code", "execution_count": 6, "id": "8915d017-2eab-4256-943c-1f15d937d5dc", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a70a526f280b4995b447386b068f9958", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading checkpoint shards: 0%| | 0/30 [00:00