# Tool Calling 201: Llama to find Differences between two papers

The image below illustrates the demo in this notebook. 

**Goal:** Use `Meta-Llama-3.1-70b` model to find the differences between two papers

- Step 1: Take the user input query 

- Step 2: Perform an internet search using `tavily` API to fetch the arxiv ID(s) based on the user query

Note: `3.1` models support `brave_search` but this notebook is also aimed at showcasing custom tools. 

The above is important because many-times the user-query is different from the paper name and arxiv ID-this will help us with the next step

- Step 3: Use the web results to extract the arxiv ID(s) of the papers

We will use an 8b model here because who wants to deal with complex regex, that's the main-use case of LLM(s), isn't it? :D

- Step 4: Use `arxiv` API to download the PDF(s) of the papers in user query

- Step 5: For ease, we will extract first 80k words from the PDF and write these to a `.txt` file that we can summarise

- Step 6: Use instances of `Meta-Llama-3.1-8b` instances to summaries the two PDF(s)

- Step 7: Prompt the `70b` model to get the differences between the two papers being discussed

## Part 1: Defining the pieces

We will start by describing all the modules from the image above, to make sure our logic works.

In second half of the notebook, we will write a simple function to take care of the function calling logic

#### Install necessary libraries

In [37]:
#!pip3 install groq
#!pip3 install arxiv
#!pip3 install tavily-python
#!pip3 install llama-toolchain
#!pip3 install PyPDF2

#### Necessary imports

##### Note: PLEASE REPLACE API KEYS BELOW WITH YOUR REAL ONES

In [38]:
import os, arxiv, PyPDF2
from tavily import TavilyClient
from groq import Groq

# Create the Groq client
client = Groq(api_key='gsk_PDfGP611i_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')

tavily_client = TavilyClient(api_key='fake_key_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')


#### Main LLM thread: 

We will use a `MAIN_SYSTEM_PROMPT` and a `main_model_chat_history` to keep track of the discussion, since we are using 4 instances of LLM(s) along with this. 

Note, if you paid attention and notice that the SYSTEM_PROMPT here is different-thanks for reading closely! It's always a great idea to follow the official recommendations. 

However, when it's a matter of writing complex regex, we can bend the rules slightly :D

Note, we will outline the functions here and define them as we go

In [50]:
MAIN_SYSTEM_PROMPT = """
Environment: iPython
Cutting Knowledge Date: December 2023
Today Date: 15 September 2024

# Tool Instructions
- Always execute python code in messages that you share.
- When looking for real time information use relevant functions if available

You have access to the following functions:

Use the function 'query_for_two_papers' to: Get the internet query results for the arxiv ID of the two papers user wants to compare
{
  "name": "query_for_two_papers",
  "description": "Internet search the arxiv ID of two papers that user wants to look up",
  "parameters": {
    "paper_1": {
      "param_type": "string",
      "description": "arxiv id of paper_name_1 from user query",
      "required": true
    },
    "paper_2": {
      "param_type": "string",
      "description": "arxiv id of paper_name_2 from user query",
      "required": true
    },
  }
}

Use the function 'get_arxiv_ids' to: Given a dict of websearch queries, use a LLM to return JUST the arxiv ID, which is otherwise harder to extract
{
  "name": "get_arxiv_ids",
  "description": "Use the dictionary returned from query_for_two_papers to ask a LLM to extract the arxiv IDs",
  "parameters": {
    "web_results": {
      "param_type": "dictionary",
      "description": "dictionary of search result for a query from the previous function",
      "required": true
    },
  }
}

Use the function 'process_arxiv_paper' to: Given the arxiv ID from get_arxiv_ids function, return a download txt file of the paper that we can then use for summarising
{
  "name": "process_arxiv_paper",
  "description": "Use arxiv IDs extracted from earlier to be downloaded and saved to txt files",
  "parameters": {
    "arxiv_id": {
      "param_type": "string",
      "description": "arxiv ID of the paper that we want to download and save a txt file of",
      "required": true
    },
  }
}

Use the function 'summarize_text_file' to: Given the txt file name based on the arxiv IDs we are working with from earlier, get a summary of the paper being discussed
{
  "name": "summarize_text_file",
  "description": "Summarise the arxiv paper saved in the txt file",
  "parameters": {
    "file_name": {
      "param_type": "string",
      "description": "Filename to be used to get a summary of",
      "required": true
    },
  }
}

If a you choose to call a function ONLY reply in the following format:
<{start_tag}={function_name}>{parameters}{end_tag}
where

start_tag => `<function`
parameters => a JSON dict with the function argument name as key and function argument value as value.
end_tag => `</function>`

Here is an example,
<function=example_function_name>{"example_name": "example_value"}</function>

Reminder:
- When user is asking for a question that requires your reasoning, DO NOT USE OR FORCE a function call
- Even if you remember the arxiv ID of papers from input, do not put that in the query_two_papers function call, pass the internet look up query
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- When returning a function call, don't add anything else to your response

"""

In [51]:
main_model_chat_history = [
    {
        "role" : "system",
        "content" : MAIN_SYSTEM_PROMPT
    }
]

#### Define the `model_chat` instance

We will be using this to handle all user input(s)

In [52]:

def model_chat(user_input: str, temperature: int = 0, max_tokens=2048):
    
    main_model_chat_history.append({"role": "user", "content": user_input})
    
    #print(chat_history)
    
    #print("User: ", user_input)
    
    response = client.chat.completions.create(model="llama-3.1-70b-versatile",
                                          messages=main_model_chat_history,
                                          max_tokens=max_tokens,
                                          temperature=temperature)
    
    main_model_chat_history.append({
    "role": "assistant",
    "content": response.choices[0].message.content
    })
    
    
    #print("Assistant:", response.choices[0].message.content)
    
    return response.choices[0].message.content

In [42]:
user_input = """
What are the differences between llama 3.1 and BERT?
"""

output = model_chat(user_input, temperature=1)

In [43]:
print(output)

<function=query_for_two_papers>{"paper_1": "Llama", "paper_2": "BERT"}</function>


If you remember from `Tool_Calling_101.ipynb`, we need a way to extract and manage tool calling based on the response, the system prompt from earlier makes our lives easier to answer do this later :)

First, let's validate the logic and define all the functions as we go:

#### Tavily API: 

We will use the Tavily API to do a web query for the papers based on the model outputs

In [29]:
def query_for_two_papers(paper_1:str , paper_2: str) -> None :
     return [tavily_client.search(f"arxiv id of {paper_1}"), tavily_client.search(f"arxiv id of {paper_2}")]

In [13]:
search_results = query_for_two_papers("llama 3.1", "BERT")
#search_results

In [14]:
user_input = f"""
Here are the search results for the first paper, extract the arxiv ID {search_results[0]}
"""

output = model_chat(user_input, temperature=1)

In [15]:
print(output)

<function=get_arxiv_id>{"web_results": "{'query': 'arxiv id of llama 3.1', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'TheLlama3HerdofModels - arXiv.org', 'url': 'https://arxiv.org/pdf/2407.21783', 'content': 'arXiv:2407.21783v2 [cs.AI] 15 Aug 2024. Finetuned Multilingual Longcontext Tooluse Release ... The model architecture of Llama 3 is illustrated in Figure1. The development of our Llama 3 language modelscomprisestwomainstages:', 'score': 0.9955835, 'raw_content': None}, {'title': 'NousResearch/Meta-Llama-3.1-8B - Hugging Face', 'url': 'https://huggingface.co/NousResearch/Meta-Llama-3.1-8B', 'content': 'The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available ...'

In [16]:
user_input = f"""
Here are the search results for the second paper now, extract the arxiv ID {search_results[1]}
"""

output = model_chat(user_input, temperature=1)

In [17]:
print(output)

<function=get_arxiv_id>{"web_results": "{'query': 'arxiv id of BERT', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': '[2103.11943] BERT: A Review of Applications in Natural Language ...', 'url': 'https://arxiv.org/abs/2103.11943', 'content': 'arXiv:2103.11943 (cs) [Submitted on 22 Mar 2021] BERT: A Review of Applications in Natural Language Processing and Understanding. M. V. Koroteev. In this review, we describe the application of one of the most popular deep learning-based language models - BERT. The paper describes the mechanism of operation of this model, the main areas of its ...', 'score': 0.99411184, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://aclanthology.org/N19-1423/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation mode

#### Extracting Arxiv IDs: 

At this point, you would know the author is allergic to writing regex. To deal with this, we will simply use an `8b` instance to extract the `arxiv id` from the paper:

In [18]:
def get_arxiv_ids(web_results: dict, temperature: int = 0, max_tokens=512):
    # Initialize chat history with a specific prompt to extract arXiv IDs
    arxiv_id_chat_history = [{"role": "system", "content": "Given this input, give me the arXiv ID of the papers. The input has the query and web results. DO NOT WRITE ANYTHING ELSE IN YOUR RESPONSE: ONLY THE ARXIV ID ONCE, the web search will have it repeated multiple times, just return the it once and where its actually the arxiv ID"}, {"role": "user", "content": f"Here is the query and results{web_results}"}]

    # Call the model to process the input and extract arXiv IDs
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",  # Adjust the model as necessary
        messages=arxiv_id_chat_history,
        max_tokens=max_tokens,
        temperature=temperature
    )
    
    # Append the assistant's response to the chat history
    arxiv_id_chat_history.append({
        "role": "assistant",
        "content": response.choices[0].message.content
    })
    
    # Return the extracted arXiv IDs
    return response.choices[0].message.content

In [19]:
print(get_arxiv_ids(search_results[0]))
print(get_arxiv_ids(search_results[1]))

2407.21783
2103.11943


#### Downloading the papers and extracting details: 

Llama 3.1 family LLM(s) are great enough to use raw outputs extracted from a PDF and summarise them. However, we are still bound by their (great) 128k context length-to live with this, we will extract just the first 80k words. 

The functions below handle the logic of downloading the PDF(s) and extracting their outputs

In [24]:
# Function to download PDF using arxiv library
def download_pdf(arxiv_id, filename):
    paper = next(arxiv.Client().results(arxiv.Search(id_list=[arxiv_id])))
    paper.download_pdf(filename=filename)

# Function to convert PDF to text
def pdf_to_text(filename):
    with open(filename, "rb") as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            if page.extract_text():
                text += page.extract_text() + " "
    return text

# Function to truncate text after 80k words
def truncate_text(text, limit=20000):
    words = text.split()
    truncated = ' '.join(words[:limit])
    return truncated

# Main function to process an arXiv ID
def process_arxiv_paper(arxiv_id):
    pdf_filename = f"{arxiv_id}.pdf"
    txt_filename = f"{arxiv_id}.txt"
    
    # Download PDF
    download_pdf(arxiv_id, pdf_filename)
    
    # Convert PDF to text
    text = pdf_to_text(pdf_filename)
    
    # Truncate text
    truncated_text = truncate_text(text)
    
    # Save to txt file
    with open(txt_filename, "w", encoding="utf-8") as file:
        file.write(truncated_text)
    print(f"Processed text saved to {txt_filename}")

# Example usage
arxiv_id = "2407.21783"
process_arxiv_paper(arxiv_id)

arxiv_id = "2103.11943"
process_arxiv_paper(arxiv_id)

Processed text saved to 2407.21783.txt
Processed text saved to 2103.11943.txt


#### Summarising logic: 

We can use a `8b` model instance to summarise our papers:

In [25]:

SUMMARISER_PROMPT = """
Cutting Knowledge Date: December 2023
Today Date: 15 September 2024
You are an expert summariser of research papers, below you will get an input of the text from an arxiv paper and your job is to read it carefully and return a concise summary with some bullet points at the end of some key-takeways from it
"""

def summarize_text_file(file_name: str, temperature: int = 0, max_tokens=2048):
    # Read the content of the file
    with open(file_name, 'r') as file:
        file_content = file.read()
    
    # Initialize chat history
    chat_history = [{"role": "system", "content": f"{SUMMARISER_PROMPT}"}, {"role": "user", "content": f"Text of the paper: {file_content}"}]
    
    # Generate a summary using the model
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",  # You can change the model as needed
        messages=chat_history,
        max_tokens=max_tokens,
        temperature=temperature
    )
    
    # Append the assistant's response to the chat history
    chat_history.append({
        "role": "assistant",
        "content": response.choices[0].message.content
    })
    
    # Return the summary
    return response.choices[0].message.content

In [36]:
paper_1_summary = summarize_text_file("2407.21783.txt")
print(paper_1_summary)


Summary:
This paper introduces Llama 3, a new set of foundation models developed by Meta AI. The Llama 3 family consists of models with 8B, 70B, and 405B parameters, capable of handling tasks in multiple languages and modalities. The paper details the pre-training and post-training processes, infrastructure improvements, and evaluations across various benchmarks. Llama 3 demonstrates competitive performance compared to other leading language models, including GPT-4 and Claude 3.5 Sonnet, on a wide range of tasks. The paper also explores multimodal capabilities by integrating vision and speech components, although these are still under development and not ready for release.
Key takeaways:

Llama 3 includes models with 8B, 70B, and 405B parameters, with the flagship 405B model trained on 15.6T tokens.
The models excel in multilingual capabilities, coding, reasoning, and tool usage.
Llama 3 uses a dense Transformer architecture with minimal modifications, focusing on high-quality data an

In [46]:
paper_2_summary = summarize_text_file("2103.11943.txt")
print(paper_2_summary)


BERT is a novel language representation model developed by researchers at Google AI. It stands for Bidirectional Encoder Representations from Transformers and introduces a new approach to pre-training deep bidirectional representations from unlabeled text. Unlike previous models that looked at text sequences either from left-to-right or combined left-to-right and right-to-left training, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
The key innovation is the application of bidirectional training of Transformer, a popular attention model, to language modeling. This is achieved through two pre-training tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). In MLM, the model attempts to predict masked words in a sentence, allowing it to incorporate context from both directions. NSP trains the model to understand relationships between sentences.
BERT significantly outperformed previous 

In [56]:
user_input = f"""
Here are the summaries of the two papers, look at them closely and tell me the differences of the papers: Paper 1 Summary {paper_1_summary} and Paper 2 Summary {paper_2_summary}
"""

output = model_chat(user_input, temperature=1)

In [57]:
print(output)

The two paper summaries are about different language models: Llama 3 and BERT.

The main differences are:

1. Model Type: Llama 3 is a set of foundation models developed by Meta AI, while BERT is a language representation model developed by researchers at Google AI.
2. Model Architecture: Llama 3 uses a dense Transformer architecture, while BERT uses a bidirectional Transformer architecture.
3. Training Process: Llama 3 involves significant infrastructure improvements to handle large-scale distributed training, while BERT uses pre-training tasks such as Masked Language Model (MLM) and Next Sentence Prediction (NSP).
4. Multimodal Capabilities: Llama 3 explores multimodal capabilities by integrating vision and speech components, while BERT focuses on text-based language understanding.
5. Performance: Both models demonstrate competitive performance on various benchmarks, but Llama 3 shows performance on tasks such as multilingual capabilities, coding, reasoning, and tool usage, while BER

## Part 2: Handle the function calling logic: 

Now that we have validated a MVP, we can write a simple function to handle tool-calling:

In [33]:
def handle_llm_output(llm_output):
    # Check if the output starts with "<function="
    if llm_output.startswith("<function="):
        return extract_details_and_call_function(llm_output)
    else:
        # Output does not start with "<function=", return as is
        return llm_output

def extract_details_and_call_function(input_string):
    # Extract the function name and parameters
    prefix = "<function="
    suffix = "</function>"
    start = input_string.find(prefix) + len(prefix)
    end = input_string.find(suffix)
    function_and_params = input_string[start:end]
    
    # Split to get function name and parameters
    function_name, params_json = function_and_params.split(">{")
    function_name = function_name.strip()
    params_json = "{" + params_json
    
    # Convert parameters to dictionary
    params = json.loads(params_json)
    
    # Call the function dynamically
    function_map = {
        "query_for_two_papers": query_for_two_papers,
        "get_arxiv_id": get_arxiv_ids,
        "process_arxiv_paper": process_arxiv_paper,
        "summarise_text_file": summarize_text_file
    }
    
    if function_name in function_map:
        result = function_map[function_name](**params)
        return result
    else:
        return "Function not found"

# Testing usage
llm_outputs = [
    "<function=query_for_two_papers>{\"paper_1\": \"Llama 3.1\", \"paper_2\": \"BERT\"}</function>",
    "Llama 3.2 models are here too btw!"
]

for output in llm_outputs:
    result = handle_llm_output(output)
    print(result)

[{'query': 'arxiv id of Llama 3.1', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'TheLlama3HerdofModels - arXiv.org', 'url': 'https://arxiv.org/pdf/2407.21783', 'content': 'arXiv:2407.21783v2 [cs.AI] 15 Aug 2024. Finetuned Multilingual Longcontext Tooluse Release ... The model architecture of Llama 3 is illustrated in Figure1. The development of our Llama 3 language modelscomprisestwomainstages:', 'score': 0.9961004, 'raw_content': None}, {'title': '[PDF] The Llama 3 Herd of Models - Semantic Scholar', 'url': 'https://www.semanticscholar.org/paper/The-Llama-3-Herd-of-Models-Dubey-Jauhri/6520557cc3bfd198f960cc8cb6151c3474321bd8', 'content': 'DOI: 10.48550/arXiv.2407.21783 Corpus ID: 271571434; The Llama 3 Herd of Models @article{Dubey2024TheL3, title={The Llama 3 Herd of Models}, author={Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al-Dahle and Aiesha Letman and Akhil Mathur and Alan Schelten and Amy Yang and Ang

In [None]:
#fin