{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tool Calling 201: Llama to find Differences between two papers\n",
    "\n",
    "The image below illustrates the demo in this notebook. \n",
    "\n",
    "**Goal:** Use `Meta-Llama-3.1-70b` model to find the differences between two papers\n",
    "\n",
    "- Step 1: Take the user input query \n",
    "\n",
    "- Step 2: Perform an internet search using `tavily` API to fetch the arxiv ID(s) based on the user query\n",
    "\n",
    "Note: `3.1` models support `brave_search` but this notebook is also aimed at showcasing custom tools. \n",
    "\n",
    "The above is important because many-times the user-query is different from the paper name and arxiv ID-this will help us with the next step\n",
    "\n",
    "- Step 3: Use the web results to extract the arxiv ID(s) of the papers\n",
    "\n",
    "We will use an 8b model here because who wants to deal with complex regex, that's the main-use case of LLM(s), isn't it? :D\n",
    "\n",
    "- Step 4: Use `arxiv` API to download the PDF(s) of the papers in user query\n",
    "\n",
    "- Step 5: For ease, we will extract first 80k words from the PDF and write these to a `.txt` file that we can summarise\n",
    "\n",
    "- Step 6: Use instances of `Meta-Llama-3.1-8b` instances to summaries the two PDF(s)\n",
    "\n",
    "- Step 7: Prompt the `70b` model to get the differences between the two papers being discussed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1: Defining the pieces\n",
    "\n",
    "We will start by describing all the modules from the image above, to make sure our logic works.\n",
    "\n",
    "In second half of the notebook, we will write a simple function to take care of the function calling logic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Install necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip3 install groq\n",
    "#!pip3 install arxiv\n",
    "#!pip3 install tavily-python\n",
    "#!pip3 install llama-toolchain\n",
    "#!pip3 install PyPDF2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Necessary imports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Note: PLEASE REPLACE API KEYS BELOW WITH YOUR REAL ONES"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, arxiv, PyPDF2\n",
    "from tavily import TavilyClient\n",
    "from groq import Groq\n",
    "\n",
    "# Create the Groq client\n",
    "client = Groq(api_key='gsk_PDfGP611i_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')\n",
    "\n",
    "tavily_client = TavilyClient(api_key='fake_key_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Main LLM thread: \n",
    "\n",
    "We will use a `MAIN_SYSTEM_PROMPT` and a `main_model_chat_history` to keep track of the discussion, since we are using 4 instances of LLM(s) along with this. \n",
    "\n",
    "Note, if you paid attention and notice that the SYSTEM_PROMPT here is different-thanks for reading closely! It's always a great idea to follow the official recommendations. \n",
    "\n",
    "However, when it's a matter of writing complex regex, we can bend the rules slightly :D\n",
    "\n",
    "Note, we will outline the functions here and define them as we go"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "MAIN_SYSTEM_PROMPT = \"\"\"\n",
    "Environment: iPython\n",
    "Cutting Knowledge Date: December 2023\n",
    "Today Date: 15 September 2024\n",
    "\n",
    "# Tool Instructions\n",
    "- Always execute python code in messages that you share.\n",
    "- When looking for real time information use relevant functions if available\n",
    "\n",
    "You have access to the following functions:\n",
    "\n",
    "Use the function 'query_for_two_papers' to: Get the internet query results for the arxiv ID of the two papers user wants to compare\n",
    "{\n",
    "  \"name\": \"query_for_two_papers\",\n",
    "  \"description\": \"Internet search the arxiv ID of two papers that user wants to look up\",\n",
    "  \"parameters\": {\n",
    "    \"paper_1\": {\n",
    "      \"param_type\": \"string\",\n",
    "      \"description\": \"arxiv id of paper_name_1 from user query\",\n",
    "      \"required\": true\n",
    "    },\n",
    "    \"paper_2\": {\n",
    "      \"param_type\": \"string\",\n",
    "      \"description\": \"arxiv id of paper_name_2 from user query\",\n",
    "      \"required\": true\n",
    "    },\n",
    "  }\n",
    "}\n",
    "\n",
    "Use the function 'get_arxiv_ids' to: Given a dict of websearch queries, use a LLM to return JUST the arxiv ID, which is otherwise harder to extract\n",
    "{\n",
    "  \"name\": \"get_arxiv_ids\",\n",
    "  \"description\": \"Use the dictionary returned from query_for_two_papers to ask a LLM to extract the arxiv IDs\",\n",
    "  \"parameters\": {\n",
    "    \"web_results\": {\n",
    "      \"param_type\": \"dictionary\",\n",
    "      \"description\": \"dictionary of search result for a query from the previous function\",\n",
    "      \"required\": true\n",
    "    },\n",
    "  }\n",
    "}\n",
    "\n",
    "Use the function 'process_arxiv_paper' to: Given the arxiv ID from get_arxiv_ids function, return a download txt file of the paper that we can then use for summarising\n",
    "{\n",
    "  \"name\": \"process_arxiv_paper\",\n",
    "  \"description\": \"Use arxiv IDs extracted from earlier to be downloaded and saved to txt files\",\n",
    "  \"parameters\": {\n",
    "    \"arxiv_id\": {\n",
    "      \"param_type\": \"string\",\n",
    "      \"description\": \"arxiv ID of the paper that we want to download and save a txt file of\",\n",
    "      \"required\": true\n",
    "    },\n",
    "  }\n",
    "}\n",
    "\n",
    "Use the function 'summarize_text_file' to: Given the txt file name based on the arxiv IDs we are working with from earlier, get a summary of the paper being discussed\n",
    "{\n",
    "  \"name\": \"summarize_text_file\",\n",
    "  \"description\": \"Summarise the arxiv paper saved in the txt file\",\n",
    "  \"parameters\": {\n",
    "    \"file_name\": {\n",
    "      \"param_type\": \"string\",\n",
    "      \"description\": \"Filename to be used to get a summary of\",\n",
    "      \"required\": true\n",
    "    },\n",
    "  }\n",
    "}\n",
    "\n",
    "If a you choose to call a function ONLY reply in the following format:\n",
    "<{start_tag}={function_name}>{parameters}{end_tag}\n",
    "where\n",
    "\n",
    "start_tag => `<function`\n",
    "parameters => a JSON dict with the function argument name as key and function argument value as value.\n",
    "end_tag => `</function>`\n",
    "\n",
    "Here is an example,\n",
    "<function=example_function_name>{\"example_name\": \"example_value\"}</function>\n",
    "\n",
    "Reminder:\n",
    "- When user is asking for a question that requires your reasoning, DO NOT USE OR FORCE a function call\n",
    "- Even if you remember the arxiv ID of papers from input, do not put that in the query_two_papers function call, pass the internet look up query\n",
    "- Function calls MUST follow the specified format\n",
    "- Required parameters MUST be specified\n",
    "- Only call one function at a time\n",
    "- Put the entire function call reply on one line\n",
    "- When returning a function call, don't add anything else to your response\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "main_model_chat_history = [\n",
    "    {\n",
    "        \"role\" : \"system\",\n",
    "        \"content\" : MAIN_SYSTEM_PROMPT\n",
    "    }\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Define the `model_chat` instance\n",
    "\n",
    "We will be using this to handle all user input(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def model_chat(user_input: str, temperature: int = 0, max_tokens=2048):\n",
    "    \n",
    "    main_model_chat_history.append({\"role\": \"user\", \"content\": user_input})\n",
    "    \n",
    "    #print(chat_history)\n",
    "    \n",
    "    #print(\"User: \", user_input)\n",
    "    \n",
    "    response = client.chat.completions.create(model=\"llama-3.1-70b-versatile\",\n",
    "                                          messages=main_model_chat_history,\n",
    "                                          max_tokens=max_tokens,\n",
    "                                          temperature=temperature)\n",
    "    \n",
    "    main_model_chat_history.append({\n",
    "    \"role\": \"assistant\",\n",
    "    \"content\": response.choices[0].message.content\n",
    "    })\n",
    "    \n",
    "    \n",
    "    #print(\"Assistant:\", response.choices[0].message.content)\n",
    "    \n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_input = \"\"\"\n",
    "What are the differences between llama 3.1 and BERT?\n",
    "\"\"\"\n",
    "\n",
    "output = model_chat(user_input, temperature=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<function=query_for_two_papers>{\"paper_1\": \"Llama\", \"paper_2\": \"BERT\"}</function>\n"
     ]
    }
   ],
   "source": [
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you remember from `Tool_Calling_101.ipynb`, we need a way to extract and manage tool calling based on the response, the system prompt from earlier makes our lives easier to answer do this later :)\n",
    "\n",
    "First, let's validate the logic and define all the functions as we go:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tavily API: \n",
    "\n",
    "We will use the Tavily API to do a web query for the papers based on the model outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "def query_for_two_papers(paper_1:str , paper_2: str) -> None :\n",
    "     return [tavily_client.search(f\"arxiv id of {paper_1}\"), tavily_client.search(f\"arxiv id of {paper_2}\")]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "search_results = query_for_two_papers(\"llama 3.1\", \"BERT\")\n",
    "#search_results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_input = f\"\"\"\n",
    "Here are the search results for the first paper, extract the arxiv ID {search_results[0]}\n",
    "\"\"\"\n",
    "\n",
    "output = model_chat(user_input, temperature=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<function=get_arxiv_id>{\"web_results\": \"{'query': 'arxiv id of llama 3.1', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'TheLlama3HerdofModels - arXiv.org', 'url': 'https://arxiv.org/pdf/2407.21783', 'content': 'arXiv:2407.21783v2 [cs.AI] 15 Aug 2024. Finetuned Multilingual Longcontext Tooluse Release ... The model architecture of Llama 3 is illustrated in Figure1. The development of our Llama 3 language modelscomprisestwomainstages:', 'score': 0.9955835, 'raw_content': None}, {'title': 'NousResearch/Meta-Llama-3.1-8B - Hugging Face', 'url': 'https://huggingface.co/NousResearch/Meta-Llama-3.1-8B', 'content': 'The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available ...', 'score': 0.95379424, 'raw_content': None}, {'title': 'Introducing Llama 3.1: Our most capable models to date - Meta AI', 'url': 'https://ai.meta.com/blog/meta-llama-3-1/', 'content': 'Bringing open intelligence to all, our latest models expand context length to 128K, add support across eight languages, and include Llama 3.1 405B—the first frontier-level open source AI model. Llama 3.1 405B is in a class of its own, with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models.', 'score': 0.9003547, 'raw_content': None}, {'title': 'The Llama 3 Herd of Models | Research - AI at Meta', 'url': 'https://ai.meta.com/research/publications/the-llama-3-herd-of-models/', 'content': 'This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety.', 'score': 0.89460546, 'raw_content': None}, {'title': '[2407.21783] The Llama 3 Herd of Models - arXiv.org', 'url': 'https://arxiv.org/abs/2407.21783', 'content': 'Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive ...', 'score': 0.6841585, 'raw_content': None}], 'response_time': 2.09}\"}</function>\n"
     ]
    }
   ],
   "source": [
    "print(output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_input = f\"\"\"\n",
    "Here are the search results for the second paper now, extract the arxiv ID {search_results[1]}\n",
    "\"\"\"\n",
    "\n",
    "output = model_chat(user_input, temperature=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<function=get_arxiv_id>{\"web_results\": \"{'query': 'arxiv id of BERT', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': '[2103.11943] BERT: A Review of Applications in Natural Language ...', 'url': 'https://arxiv.org/abs/2103.11943', 'content': 'arXiv:2103.11943 (cs) [Submitted on 22 Mar 2021] BERT: A Review of Applications in Natural Language Processing and Understanding. M. V. Koroteev. In this review, we describe the application of one of the most popular deep learning-based language models - BERT. The paper describes the mechanism of operation of this model, the main areas of its ...', 'score': 0.99411184, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://aclanthology.org/N19-1423/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning ...', 'score': 0.9222025, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://research.google/pubs/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.', 'score': 0.87652874, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://arxiv.org/abs/1810.04805', 'content': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned ...', 'score': 0.66115755, 'raw_content': None}, {'title': 'A Primer in BERTology: What We Know About How BERT Works', 'url': 'https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT', 'content': 'The issue of model depth must be related to the information flow from the most task-specific layers closer to the classifier (Liu et al., 2019a), to the initial layers which appear to be the most task-invariant (Hao et al., 2019), and where the tokens resemble the input tokens the most (Brunner et al., 2020) For BERT, this has been achieved through experiments with loss functions (Sanh et al., 2019; Jiao et al., 2019), mimicking the activation patterns of individual portions of the teacher network (Sun et al., 2019a), and knowledge transfer at the pre-training (Turc et al., 2019; Jiao et al., 2019; Sun et al., 2020) or fine-tuning stage (Jiao et al., 2019). In particular, they were shown to rely on shallow heuristics in natural language inference (McCoy et al., 2019b; Zellers et al., 2019; Jin et al., 2020), reading comprehension (Si et al., 2019; Rogers et al., 2020; Sugawara et al., 2020; Yogatama et al., 2019), argument reasoning comprehension (Niven and Kao, 2019), and text classification (Jin et al., 2020). Several studies explored the possibilities of improving the fine-tuning of BERT:\\\\nTaking more layers into account: learning a complementary representation of the information in deep and output layers (Yang and Zhao, 2019), using a weighted combination of all layers instead of the final one (Su and Cheng, 2019; Kondratyuk and Straka, 2019), and layer dropout (Kondratyuk and Straka, 2019).\\\\n For BERT, Clark et al. (2019) observe that most heads in the same layer show similar self-attention patterns (perhaps related to the fact that the output of all self-attention heads in a layer is passed through the same MLP), which explains why Michel et al. (2019) were able to reduce most layers to a single head.\\\\n', 'score': 0.4248892, 'raw_content': None}], 'response_time': 2.16}\"}</function>\n"
     ]
    }
   ],
   "source": [
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Extracting Arxiv IDs: \n",
    "\n",
    "At this point, you would know the author is allergic to writing regex. To deal with this, we will simply use an `8b` instance to extract the `arxiv id` from the paper:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_arxiv_ids(web_results: dict, temperature: int = 0, max_tokens=512):\n",
    "    # Initialize chat history with a specific prompt to extract arXiv IDs\n",
    "    arxiv_id_chat_history = [{\"role\": \"system\", \"content\": \"Given this input, give me the arXiv ID of the papers. The input has the query and web results. DO NOT WRITE ANYTHING ELSE IN YOUR RESPONSE: ONLY THE ARXIV ID ONCE, the web search will have it repeated mutliple times, just return the it once and where its actually the arxiv ID\"}, {\"role\": \"user\", \"content\": f\"Here is the query and results{web_results}\"}]\n",
    "\n",
    "    # Call the model to process the input and extract arXiv IDs\n",
    "    response = client.chat.completions.create(\n",
    "        model=\"llama-3.1-8b-instant\",  # Adjust the model as necessary\n",
    "        messages=arxiv_id_chat_history,\n",
    "        max_tokens=max_tokens,\n",
    "        temperature=temperature\n",
    "    )\n",
    "    \n",
    "    # Append the assistant's response to the chat history\n",
    "    arxiv_id_chat_history.append({\n",
    "        \"role\": \"assistant\",\n",
    "        \"content\": response.choices[0].message.content\n",
    "    })\n",
    "    \n",
    "    # Return the extracted arXiv IDs\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2407.21783\n",
      "2103.11943\n"
     ]
    }
   ],
   "source": [
    "print(get_arxiv_ids(search_results[0]))\n",
    "print(get_arxiv_ids(search_results[1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Downloading the papers and extracting details: \n",
    "\n",
    "Llama 3.1 family LLM(s) are great enough to use raw outputs extracted from a PDF and summarise them. However, we are still bound by their (great) 128k context length-to live with this, we will extract just the first 80k words. \n",
    "\n",
    "The functions below handle the logic of downloading the PDF(s) and extracting their outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed text saved to 2407.21783.txt\n",
      "Processed text saved to 2103.11943.txt\n"
     ]
    }
   ],
   "source": [
    "# Function to download PDF using arxiv library\n",
    "def download_pdf(arxiv_id, filename):\n",
    "    paper = next(arxiv.Client().results(arxiv.Search(id_list=[arxiv_id])))\n",
    "    paper.download_pdf(filename=filename)\n",
    "\n",
    "# Function to convert PDF to text\n",
    "def pdf_to_text(filename):\n",
    "    with open(filename, \"rb\") as file:\n",
    "        reader = PyPDF2.PdfReader(file)\n",
    "        text = \"\"\n",
    "        for page in reader.pages:\n",
    "            if page.extract_text():\n",
    "                text += page.extract_text() + \" \"\n",
    "    return text\n",
    "\n",
    "# Function to truncate text after 80k words\n",
    "def truncate_text(text, limit=20000):\n",
    "    words = text.split()\n",
    "    truncated = ' '.join(words[:limit])\n",
    "    return truncated\n",
    "\n",
    "# Main function to process an arXiv ID\n",
    "def process_arxiv_paper(arxiv_id):\n",
    "    pdf_filename = f\"{arxiv_id}.pdf\"\n",
    "    txt_filename = f\"{arxiv_id}.txt\"\n",
    "    \n",
    "    # Download PDF\n",
    "    download_pdf(arxiv_id, pdf_filename)\n",
    "    \n",
    "    # Convert PDF to text\n",
    "    text = pdf_to_text(pdf_filename)\n",
    "    \n",
    "    # Truncate text\n",
    "    truncated_text = truncate_text(text)\n",
    "    \n",
    "    # Save to txt file\n",
    "    with open(txt_filename, \"w\", encoding=\"utf-8\") as file:\n",
    "        file.write(truncated_text)\n",
    "    print(f\"Processed text saved to {txt_filename}\")\n",
    "\n",
    "# Example usage\n",
    "arxiv_id = \"2407.21783\"\n",
    "process_arxiv_paper(arxiv_id)\n",
    "\n",
    "arxiv_id = \"2103.11943\"\n",
    "process_arxiv_paper(arxiv_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Summarising logic: \n",
    "\n",
    "We can use a `8b` model instance to summarise our papers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "SUMMARISER_PROMPT = \"\"\"\n",
    "Cutting Knowledge Date: December 2023\n",
    "Today Date: 15 September 2024\n",
    "You are an expert summariser of research papers, below you will get an input of the text from an arxiv paper and your job is to read it carefully and return a concise summary with some bullet points at the end of some key-takeways from it\n",
    "\"\"\"\n",
    "\n",
    "def summarize_text_file(file_name: str, temperature: int = 0, max_tokens=2048):\n",
    "    # Read the content of the file\n",
    "    with open(file_name, 'r') as file:\n",
    "        file_content = file.read()\n",
    "    \n",
    "    # Initialize chat history\n",
    "    chat_history = [{\"role\": \"system\", \"content\": f\"{SUMMARISER_PROMPT}\"}, {\"role\": \"user\", \"content\": f\"Text of the paper: {file_content}\"}]\n",
    "    \n",
    "    # Generate a summary using the model\n",
    "    response = client.chat.completions.create(\n",
    "        model=\"llama-3.1-8b-instant\",  # You can change the model as needed\n",
    "        messages=chat_history,\n",
    "        max_tokens=max_tokens,\n",
    "        temperature=temperature\n",
    "    )\n",
    "    \n",
    "    # Append the assistant's response to the chat history\n",
    "    chat_history.append({\n",
    "        \"role\": \"assistant\",\n",
    "        \"content\": response.choices[0].message.content\n",
    "    })\n",
    "    \n",
    "    # Return the summary\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Summary:\n",
      "This paper introduces Llama 3, a new set of foundation models developed by Meta AI. The Llama 3 family consists of models with 8B, 70B, and 405B parameters, capable of handling tasks in multiple languages and modalities. The paper details the pre-training and post-training processes, infrastructure improvements, and evaluations across various benchmarks. Llama 3 demonstrates competitive performance compared to other leading language models, including GPT-4 and Claude 3.5 Sonnet, on a wide range of tasks. The paper also explores multimodal capabilities by integrating vision and speech components, although these are still under development and not ready for release.\n",
      "Key takeaways:\n",
      "\n",
      "Llama 3 includes models with 8B, 70B, and 405B parameters, with the flagship 405B model trained on 15.6T tokens.\n",
      "The models excel in multilingual capabilities, coding, reasoning, and tool usage.\n",
      "Llama 3 uses a dense Transformer architecture with minimal modifications, focusing on high-quality data and increased training scale.\n",
      "The training process involved significant infrastructure improvements to handle large-scale distributed training.\n",
      "Post-training includes supervised fine-tuning, rejection sampling, and direct preference optimization to align the model with human preferences.\n",
      "Llama 3 demonstrates competitive performance on various benchmarks, including MMLU, coding tasks, and math reasoning.\n",
      "The paper presents experiments on integrating vision and speech capabilities using a compositional approach.\n",
      "Extensive safety measures were implemented, including pre-training data filtering, safety fine-tuning, and system-level protections.\n",
      "The authors are releasing the Llama 3 language models publicly to accelerate research and development in AI.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "paper_1_summary = summarize_text_file(\"2407.21783.txt\")\n",
    "print(paper_1_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "BERT is a novel language representation model developed by researchers at Google AI. It stands for Bidirectional Encoder Representations from Transformers and introduces a new approach to pre-training deep bidirectional representations from unlabeled text. Unlike previous models that looked at text sequences either from left-to-right or combined left-to-right and right-to-left training, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.\n",
      "The key innovation is the application of bidirectional training of Transformer, a popular attention model, to language modeling. This is achieved through two pre-training tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). In MLM, the model attempts to predict masked words in a sentence, allowing it to incorporate context from both directions. NSP trains the model to understand relationships between sentences.\n",
      "BERT significantly outperformed previous state-of-the-art models on a wide range of NLP tasks, including question answering, natural language inference, and others, without substantial task-specific architecture modifications. The researchers demonstrated the effectiveness of BERT by obtaining new state-of-the-art results on eleven natural language processing tasks.\n",
      "Key Takeaways:\n",
      "\n",
      "BERT introduces deep bidirectional representations, overcoming limitations of previous unidirectional or shallowly bidirectional models.\n",
      "The model uses \"masked language modeling\" (MLM) for bidirectional training of Transformer.\n",
      "BERT is pre-trained on two tasks: masked language modeling and next sentence prediction.\n",
      "It achieves state-of-the-art performance on 11 NLP tasks, including an improvement of 7.7% on the GLUE benchmark.\n",
      "BERT's architecture allows for fine-tuning with just one additional output layer, making it versatile for various NLP tasks.\n",
      "The model demonstrates that deep bidirectional language representation improves language understanding compared to left-to-right or shallow bidirectional approaches.\n",
      "BERT's performance improves with larger model sizes, even on small-scale tasks.\n",
      "The pre-training of BERT is computationally expensive but fine-tuning is relatively inexpensive.\n",
      "BERT can be used for both fine-tuning and as a feature-based approach, with competitive results in both scenarios.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "paper_2_summary = summarize_text_file(\"2103.11943.txt\")\n",
    "print(paper_2_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_input = f\"\"\"\n",
    "Here are the summaries of the two papers, look at them closely and tell me the differences of the papers: Paper 1 Summary {paper_1_summary} and Paper 2 Summary {paper_2_summary}\n",
    "\"\"\"\n",
    "\n",
    "output = model_chat(user_input, temperature=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The two paper summaries are about different language models: Llama 3 and BERT.\n",
      "\n",
      "The main differences are:\n",
      "\n",
      "1. Model Type: Llama 3 is a set of foundation models developed by Meta AI, while BERT is a language representation model developed by researchers at Google AI.\n",
      "2. Model Architecture: Llama 3 uses a dense Transformer architecture, while BERT uses a bidirectional Transformer architecture.\n",
      "3. Training Process: Llama 3 involves significant infrastructure improvements to handle large-scale distributed training, while BERT uses pre-training tasks such as Masked Language Model (MLM) and Next Sentence Prediction (NSP).\n",
      "4. Multimodal Capabilities: Llama 3 explores multimodal capabilities by integrating vision and speech components, while BERT focuses on text-based language understanding.\n",
      "5. Performance: Both models demonstrate competitive performance on various benchmarks, but Llama 3 shows performance on tasks such as multilingual capabilities, coding, reasoning, and tool usage, while BERT excels on NLP tasks such as question answering and natural language inference.\n",
      "6. Release: Llama 3 is released publicly to accelerate research and development in AI, while BERT is released as a state-of-the-art model for NLP tasks.\n",
      "7. Model Size: Llama 3 has models with 8B, 70B, and 405B parameters, while BERT's model size is not specified in the summary.\n"
     ]
    }
   ],
   "source": [
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2: Handle the function calling logic: \n",
    "\n",
    "Now that we have validated a MVP, we can write a simple function to handle tool-calling:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'query': 'arxiv id of Llama 3.1', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'TheLlama3HerdofModels - arXiv.org', 'url': 'https://arxiv.org/pdf/2407.21783', 'content': 'arXiv:2407.21783v2 [cs.AI] 15 Aug 2024. Finetuned Multilingual Longcontext Tooluse Release ... The model architecture of Llama 3 is illustrated in Figure1. The development of our Llama 3 language modelscomprisestwomainstages:', 'score': 0.9961004, 'raw_content': None}, {'title': '[PDF] The Llama 3 Herd of Models - Semantic Scholar', 'url': 'https://www.semanticscholar.org/paper/The-Llama-3-Herd-of-Models-Dubey-Jauhri/6520557cc3bfd198f960cc8cb6151c3474321bd8', 'content': 'DOI: 10.48550/arXiv.2407.21783 Corpus ID: 271571434; The Llama 3 Herd of Models @article{Dubey2024TheL3, title={The Llama 3 Herd of Models}, author={Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al-Dahle and Aiesha Letman and Akhil Mathur and Alan Schelten and Amy Yang and Angela Fan and Anirudh Goyal and Anthony Hartshorn and Aobo Yang and Archi Mitra and ...', 'score': 0.9943581, 'raw_content': None}, {'title': 'The Llama 3 Herd of Models | Research - AI at Meta', 'url': 'https://ai.meta.com/research/publications/the-llama-3-herd-of-models/', 'content': 'This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety.', 'score': 0.9320833, 'raw_content': None}, {'title': 'Introducing Llama 3.1: Our most capable models to date - Meta AI', 'url': 'https://ai.meta.com/blog/meta-llama-3-1/', 'content': 'Bringing open intelligence to all, our latest models expand context length to 128K, add support across eight languages, and include Llama 3.1 405B—the first frontier-level open source AI model. Llama 3.1 405B is in a class of its own, with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models.', 'score': 0.8467045, 'raw_content': None}, {'title': '[2407.21783] The Llama 3 Herd of Models - arXiv.org', 'url': 'https://arxiv.org/abs/2407.21783', 'content': 'Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive ...', 'score': 0.68257374, 'raw_content': None}], 'response_time': 1.7}, {'query': 'arxiv id of BERT', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': '[2103.11943] BERT: A Review of Applications in Natural Language ...', 'url': 'https://arxiv.org/abs/2103.11943', 'content': 'arXiv:2103.11943 (cs) [Submitted on 22 Mar 2021] BERT: A Review of Applications in Natural Language Processing and Understanding. M. V. Koroteev. In this review, we describe the application of one of the most popular deep learning-based language models - BERT. The paper describes the mechanism of operation of this model, the main areas of its ...', 'score': 0.99411184, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://aclanthology.org/N19-1423/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning ...', 'score': 0.9222025, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://research.google/pubs/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.', 'score': 0.87652874, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://arxiv.org/abs/1810.04805', 'content': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned ...', 'score': 0.66115755, 'raw_content': None}, {'title': 'A Primer in BERTology: What We Know About How BERT Works', 'url': 'https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT', 'content': 'The issue of model depth must be related to the information flow from the most task-specific layers closer to the classifier (Liu et al., 2019a), to the initial layers which appear to be the most task-invariant (Hao et al., 2019), and where the tokens resemble the input tokens the most (Brunner et al., 2020) For BERT, this has been achieved through experiments with loss functions (Sanh et al., 2019; Jiao et al., 2019), mimicking the activation patterns of individual portions of the teacher network (Sun et al., 2019a), and knowledge transfer at the pre-training (Turc et al., 2019; Jiao et al., 2019; Sun et al., 2020) or fine-tuning stage (Jiao et al., 2019). In particular, they were shown to rely on shallow heuristics in natural language inference (McCoy et al., 2019b; Zellers et al., 2019; Jin et al., 2020), reading comprehension (Si et al., 2019; Rogers et al., 2020; Sugawara et al., 2020; Yogatama et al., 2019), argument reasoning comprehension (Niven and Kao, 2019), and text classification (Jin et al., 2020). Several studies explored the possibilities of improving the fine-tuning of BERT:\\nTaking more layers into account: learning a complementary representation of the information in deep and output layers (Yang and Zhao, 2019), using a weighted combination of all layers instead of the final one (Su and Cheng, 2019; Kondratyuk and Straka, 2019), and layer dropout (Kondratyuk and Straka, 2019).\\n For BERT, Clark et al. (2019) observe that most heads in the same layer show similar self-attention patterns (perhaps related to the fact that the output of all self-attention heads in a layer is passed through the same MLP), which explains why Michel et al. (2019) were able to reduce most layers to a single head.\\n', 'score': 0.4250085, 'raw_content': None}], 'response_time': 2.2}]\n",
      "This is a regular output without function call.\n"
     ]
    }
   ],
   "source": [
    "def handle_llm_output(llm_output):\n",
    "    # Check if the output starts with \"<function=\"\n",
    "    if llm_output.startswith(\"<function=\"):\n",
    "        return extract_details_and_call_function(llm_output)\n",
    "    else:\n",
    "        # Output does not start with \"<function=\", return as is\n",
    "        return llm_output\n",
    "\n",
    "def extract_details_and_call_function(input_string):\n",
    "    # Extract the function name and parameters\n",
    "    prefix = \"<function=\"\n",
    "    suffix = \"</function>\"\n",
    "    start = input_string.find(prefix) + len(prefix)\n",
    "    end = input_string.find(suffix)\n",
    "    function_and_params = input_string[start:end]\n",
    "    \n",
    "    # Split to get function name and parameters\n",
    "    function_name, params_json = function_and_params.split(\">{\")\n",
    "    function_name = function_name.strip()\n",
    "    params_json = \"{\" + params_json\n",
    "    \n",
    "    # Convert parameters to dictionary\n",
    "    params = json.loads(params_json)\n",
    "    \n",
    "    # Call the function dynamically\n",
    "    function_map = {\n",
    "        \"query_for_two_papers\": query_for_two_papers,\n",
    "        \"get_arxiv_id\": get_arxiv_ids,\n",
    "        \"process_arxiv_paper\": process_arxiv_paper,\n",
    "        \"summarise_text_file\": summarize_text_file\n",
    "    }\n",
    "    \n",
    "    if function_name in function_map:\n",
    "        result = function_map[function_name](**params)\n",
    "        return result\n",
    "    else:\n",
    "        return \"Function not found\"\n",
    "\n",
    "# Testing usage\n",
    "llm_outputs = [\n",
    "    \"<function=query_for_two_papers>{\\\"paper_1\\\": \\\"Llama 3.1\\\", \\\"paper_2\\\": \\\"BERT\\\"}</function>\",\n",
    "    \"Llama 3.2 models are here too btw!\"\n",
    "]\n",
    "\n",
    "for output in llm_outputs:\n",
    "    result = handle_llm_output(output)\n",
    "    print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#fin"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}