# Tool Calling 101:

Note: If you are looking for `3.2` Featherlight Model (1B and 3B) instructions, please see the respective notebook, this one covers 3.1 models.

We are briefly introduction the `3.2` models at the end. 

Note: The new vision models behave same as `3.1` models when you are talking to the models without an image

This is part (1/2) in the tool calling series, this notebook will cover the basics of what tool calling is and how to perform it with `Llama 3.1 models`

Here's what you will learn in this notebook:

- Setup Groq to access Llama 3.1 70B model
- Avoid common mistakes when performing tool-calling with Llama
- Understand Prompt templates for Tool Calling
- Understand how the tool calls are handled under the hood
- 3.2 Model Tool Calling Format and Behaviour

In Part 2, we will learn how to build system that can get us comparision between 2 papers

## What is Tool Calling?

This approach was popularised by the [Gorilla](https://gorilla.cs.berkeley.edu) paper-which showed that Large Language Model(s) can be fine-tuned on API examples to teach them calling an external API. 

This is really cool because we can now use a LLM as a "brain" of a system and connect it to external systems to perform actions. 

In simpler words, "Llama can order your pizza for you" :) 

With the Llama 3.1 release, the models excel at tool calling and support out of box `brave_search`, `wolfram_api` and `code_interpreter`. 

However, first let's take a look at a common mistake

#### Install and setup groq dependencies

- Install `groq` api to access Llama model(s)
- Configure our client and authenticate with API Key(s), Note: PLEASE UPDATE YOUR KEY BELOW

In [1]:
#!pip3 install groq
%set_env GROQ_API_KEY=''

In [2]:
import os
from groq import Groq
# Create the Groq client
client = Groq(api_key='gsk_PDfGP611i_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')

## Common Mistake of Tool-Calling: Incorrect Prompt Template

While Llama 3.1 works with tool-calling out of box, a wrong prompt template can cause issues with unexpected behaviour. 

Sometimes, even superheroes need to be reminded of their powers. 

Let's first try "forcing a prompt response from the model"

#### Note: Remember this is the WRONG template, please scroll to next section to see the right approach if you are in a rushed copy-pasta sprint

This section will show you that the model will not use `brave_search` and `wolfram_api` out of the box unless the prompt template is set correctly. 
Even if the model is asked to do so!

In [6]:
SYSTEM_PROMPT = """
Cutting Knowledge Date: December 2023
Today Date: 20 August 2024

You are a helpful assistant
"""

In [7]:
system_prompt = {}
chat_history = []

def model_chat(user_input: str, sys_prompt = SYSTEM_PROMPT, temperature: int = 0.7, max_tokens=2048):
    
    chat_history = [
        {
            "role": "system",
            "content": sys_prompt
        }
    ]
    
    chat_history.append({"role": "user", "content": user_input})
    
    response = client.chat.completions.create(model="llama-3.1-70b-versatile",
                                          messages=chat_history,
                                          max_tokens=max_tokens,
                                          temperature=temperature)
    
    chat_history.append({
    "role": "assistant",
    "content": response.choices[0].message.content
    })
    
    
    #print("Assistant:", response.choices[0].message.content)
    
    return response.choices[0].message.content

#### Asking the model about a recent news

Since the prompt template is incorrect, it will answer using cutoff memory

In [85]:
user_input = """
When is the next elden ring game coming out?
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))

Assistant: Unfortunately, I don't have information on a specific release date for the next Elden Ring game. However, I can tell you that there have been rumors and speculations about a potential sequel or DLC (Downloadable Content) for Elden Ring.

In June 2022, the game's director, Hidetaka Miyazaki, mentioned that FromSoftware, the developer of Elden Ring, was working on "multiple" new projects, but no official announcements have been made since then.

It's also worth noting that FromSoftware has a history of taking their time to develop new games, and the studio is known for its attention to detail and commitment to quality. So, even if there is a new Elden Ring game in development, it's likely that we won't see it anytime soon.

Keep an eye on official announcements from FromSoftware and Bandai Namco, the publisher of Elden Ring, for any updates on a potential sequel or new game in the series.


#### Asking the model about a Math problem

Again, the model answer(s) based on memory and not tool-calling

In [86]:
user_input = """
When is the square root of 23131231?
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))


Assistant: To find the square root of 23131231, I'll calculate it for you.

√23131231 ≈ 4813.61


#### Can we solve this using a reminder prompt?

In [87]:
user_input = """
When is the square root of 23131231?

Can you use a tool to solve the question?
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))


Assistant: I can use a mathematical tool to solve the question.

The square root of 23131231 is:

√23131231 ≈ 4810.51


Looks like we didn't get the wolfram_api call, let's try one more time with a stronger prompt:

In [88]:
user_input = """
When is the square root of 23131231?

Can you use a tool to solve the question?

Remember you have been trained on wolfram_alpha
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))


Assistant: I can use Wolfram Alpha to calculate the square root of 23131231.

According to Wolfram Alpha, the square root of 23131231 is:

√23131231 ≈ 4809.07


### Official Prompt Template 

As you can see, the model doesn't perform tool-calling in an expected fashion above. This is because we are not following the recommended prompting format.

The Llama Stack is the go to approach to use the Llama model family and build applications. 

Let's first install the `llama_toolchain` Python package to have the Llama CLI available.

In [12]:
#!pip3 install llama-toolchain

#### Now we can learn about the various prompt formats available 

When you run the cell below-you will see models available and then we can check details for model specific prompts

In [20]:
!llama model prompt-format 

Traceback (most recent call last):
  File "/opt/miniconda3/bin/llama", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/llama_toolchain/cli/llama.py", line 44, in main
    parser.run(args)
  File "/opt/miniconda3/lib/python3.12/site-packages/llama_toolchain/cli/llama.py", line 38, in run
    args.func(args)
  File "/opt/miniconda3/lib/python3.12/site-packages/llama_toolchain/cli/model/prompt_format.py", line 59, in _run_model_template_cmd
    raise argparse.ArgumentTypeError(
argparse.ArgumentTypeError: llama3_1 is not a valid Model. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct


In [21]:
!llama model prompt-format -m Llama3.1-8B

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[m━━━━━━━━━━━━━━━━━━━┓[m
┃                                    [1mLlama 3.1 - Prompt Formats[0m                 [m[1m[0m                   ┃[m
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[m━━━━━━━━━━━━━━━━━━━┛[m
[m
[m
                                               [1;4mTokens[0m                           [m[1;4m[0m                    [m
[m
Here is a list of special tokens that are supported by Llama 3.1:               [m                    [m
[m
[1;33m • [0m[1;36;40m<|begin_of_text|>[0m: Specifies the start of the prompt                         [m[1;33m[0m[1;36;40m[0m                    [m
[1;33m • [0m[1;36;40m<|end_of_text|>[0m: Model will cease to generate more tokens. This token is gene[m[1;33m[0m[1;36;40m[0mrated only by the   [m
[1;33m   [0mbase models.                                                                 [m[1;3

## Tool Calling: Using the correct Prompt Template

With `llama-cli` we have already learned the right behaviour of the model

If everything is setup correctly-the model should now wrap function calls  with the `|<python_tag>|` following the actualy function call. 

This can allow you to manage your function calling logic accordingly. 

Time to test the theory

In [95]:
SYSTEM_PROMPT = """
Environment: iPython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: 15 September 2024
"""

user_input = """
When is the next Elden ring game coming out?
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))


Assistant: <|python_tag|>brave_search.call(query="Elden Ring sequel release date")


In [96]:
user_input = """
What is the square root of 23131231?
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))

Assistant: <|python_tag|>wolfram_alpha.call(query="square root of 23131231")


### Using this knowledge in practise

A common misconception about tool calling is: the model can handle the tool call and get your output. 

This is NOT TRUE, the actual tool call is something that you have to implement. With this knowledge, let's see how we can utilise brave search to answer our original question

In [97]:
#!pip3 install brave-search

In [98]:
SYSTEM_PROMPT = """
Environment: iPython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: 15 September 2024
"""

user_input = """
What is the square root of 23131231?
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))

Assistant: <|python_tag|>wolfram_alpha.call(query="square root of 23131231")


In [99]:
print(model_chat(user_input, sys_prompt=SYSTEM_PROMPT))

output = model_chat(user_input, sys_prompt=SYSTEM_PROMPT)

<|python_tag|>wolfram_alpha.call(query="square root of 23131231")


In [102]:
import re

# Extract the function name
fn_name = re.search(r'<\|python_tag\|>(\w+)\.', output).group(1)

# Extract the method
fn_call_method = re.search(r'\.(\w+)\(', output).group(1)

# Extract the arguments
fn_call_args = re.search(r'=\s*([^)]+)', output).group(1)

print(f"Function name: {fn_name}")
print(f"Method: {fn_call_method}")
print(f"Args: {fn_call_args}")

Function name: wolfram_alpha
Method: call
Args: "square root of 23131231"


You can implement this in different ways but the idea is the same, the LLM gives an output with the `<|python_tag|>`, which should call a tool-calling mechanism. 

This logic gets handled in the program and then the output is passed back to the model to answer the user

### Code interpreter

With the correct prompt template, Llama model can output Python (as well as code in any-language that the model has been trained on)

In [54]:
user_input = """

If I can invest 400$ every month at 5% interest rate, how long would it take me to make a 100k$ in investments?
"""

print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))

Assistant: <|python_tag|>import math

# Define the variables
monthly_investment = 400
interest_rate = 0.05
target_amount = 100000

# Calculate the number of months it would take to reach the target amount
months = 0
current_amount = 0
while current_amount < target_amount:
    current_amount += monthly_investment
    current_amount *= 1 + interest_rate / 12  # Compound interest
    months += 1

# Print the result
print(f"It would take {months} months, approximately {months / 12:.2f} years, to reach the target amount of ${target_amount:.2f}.")


Let's validate the output by running the output from the model:

In [55]:
# Define the variables
monthly_investment = 400
interest_rate = 0.05
target_amount = 100000

# Calculate the number of months it would take to reach the target amount
months = 0
current_amount = 0
while current_amount < target_amount:
    current_amount += monthly_investment
    current_amount *= 1 + interest_rate / 12  # Compound interest
    months += 1

# Print the result
print(f"It would take {months} months, approximately {months / 12:.2f} years, to reach the target amount of ${target_amount:.2f}.")

It would take 172 months, approximately 14.33 years, to reach the target amount of $100000.00.


### 3.2 Models Custom Tool Prompt Format

Life is great because Llama Team writes great docs for us, so we can conviently copy-pasta examples from there :)

[Here](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-tool-calling-(1b/3b)-) are the docs for your reference that we will be using. 

Excercise for viewer: Use `llama-toolchain` again to verify like we did earlier and then start the prompt engineering for the small Llamas.

In [3]:
function_definitions = """[
    {
        "name": "get_user_info",
        "description": "Retrieve details for a specific user by their unique identifier. Note that the provided function is in Python 3 syntax.",
        "parameters": {
            "type": "dict",
            "required": [
                "user_id"
            ],
            "properties": {
                "user_id": {
                "type": "integer",
                "description": "The unique identifier of the user. It is used to fetch the specific user details from the database."
            },
            "special": {
                "type": "string",
                "description": "Any special information or parameters that need to be considered while fetching user details.",
                "default": "none"
                }
            }
        }
    }
]
"""

In [4]:
system_prompt = """You are an expert in composing functions. You are given a question and a set of possible functions. 
Based on the question, you will need to make one or more function/tool calls to achieve the purpose. 
If none of the function can be used, point it out. If the given question lacks the parameters required by the function,
also point it out. You should only return the function call in tools call sections.

If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\n
You SHOULD NOT include any other text in the response.

Here is a list of functions in JSON format that you can invoke.\n\n{functions}\n""".format(functions=function_definitions)

In [5]:
chat_history = []

def model_chat(user_input: str, sys_prompt = system_prompt, temperature: int = 0.7, max_tokens=2048):
    
    chat_history = [
        {
            "role": "system",
            "content": system_prompt
        }
    ]
    
    chat_history.append({"role": "user", "content": user_input})
    
    response = client.chat.completions.create(model="llama-3.2-3b-preview",
                                          messages=chat_history,
                                          max_tokens=max_tokens,
                                          temperature=temperature)
    
    chat_history.append({
    "role": "assistant",
    "content": response.choices[0].message.content
    })
    
    
    #print("Assistant:", response.choices[0].message.content)
    
    return response.choices[0].message.content

Note: We are assuming a structure for dataset here:

- Name
- Email
- Age 
- Color request

In [6]:
user_input = "Can you retrieve the details for the user with the ID 7890, who has black as their special request?"

print("Assistant:", model_chat(user_input, sys_prompt=system_prompt))

Assistant: [get_user_info(user_id=7890, special='black')]


#### Dummy dataset to make sure our model stays happy :) 

In [7]:
def get_user_info(user_id: int, special: str = "none") -> dict:
    # This is a mock database of users
    user_database = {
        7890: {"name": "Emma Davis", "email": "emma@example.com", "age": 31},
        1234: {"name": "Liam Wilson", "email": "liam@example.com", "age": 28},
        2345: {"name": "Olivia Chen", "email": "olivia@example.com", "age": 35},
        3456: {"name": "Noah Taylor", "email": "noah@example.com", "age": 42},
        4567: {"name": "Ava Martinez", "email": "ava@example.com", "age": 39},
        5678: {"name": "Ethan Brown", "email": "ethan@example.com", "age": 45},
        6789: {"name": "Sophia Kim", "email": "sophia@example.com", "age": 33},
        8901: {"name": "Mason Lee", "email": "mason@example.com", "age": 29},
        9012: {"name": "Isabella Garcia", "email": "isabella@example.com", "age": 37},
        1357: {"name": "James Johnson", "email": "james@example.com", "age": 41}
    }
    
    # Check if the user exists in our mock database
    if user_id in user_database:
        user_data = user_database[user_id]
        
        # Handle the 'special' parameter
        if special != "none":
            user_data["special_info"] = f"Special request: {special}"
        
        return user_data
    else:
        return {"error": "User not found"}

In [8]:
[get_user_info(user_id=7890, special='black')]

[{'name': 'Emma Davis',
  'email': 'emma@example.com',
  'age': 31,
  'special_info': 'Special request: black'}]

### Handling Tool-Calling logic for the model

Hello Regex, my good old friend :) 

With Regex, we can write a simple way to handle tool_calling and return either the model or tool call response

In [9]:
import re
import json

# Assuming you have defined get_user_info function and SYSTEM_PROMPT

chat_history = []

def process_response(response):
    function_call_pattern = r'\[(.*?)\((.*?)\)\]'
    function_calls = re.findall(function_call_pattern, response)
    
    if function_calls:
        processed_response = []
        for func_name, args_str in function_calls:
            args_dict = {}
            for arg in args_str.split(','):
                key, value = arg.split('=')
                key = key.strip()
                value = value.strip().strip("'")
                if value.isdigit():
                    value = int(value)
                args_dict[key] = value
            
            if func_name == 'get_user_info':
                result = get_user_info(**args_dict)
                processed_response.append(f"Function call result: {json.dumps(result, indent=2)}")
            else:
                processed_response.append(f"Unknown function: {func_name}")
        return "\n".join(processed_response)
    else:
        return response

def model_chat(user_input: str, sys_prompt=system_prompt, temperature: float = 0.7, max_tokens: int = 2048):
    global chat_history
    
    if not chat_history:
        chat_history = [
            {
                "role": "system",
                "content": sys_prompt
            }
        ]
    
    chat_history.append({"role": "user", "content": user_input})
    
    response = client.chat.completions.create(
        model="llama-3.2-3b-preview",
        messages=chat_history,
        max_tokens=max_tokens,
        temperature=temperature
    )
    
    assistant_response = response.choices[0].message.content
    processed_response = process_response(assistant_response)
    
    chat_history.append({
        "role": "assistant",
        "content": assistant_response
    })
    
    return processed_response

In [10]:
user_input = "Can you retrieve the details for the user with the ID 7890, who has black as their special request?"

print("Assistant:", model_chat(user_input, sys_prompt=system_prompt))

Assistant: Function call result: {
  "name": "Emma Davis",
  "email": "emma@example.com",
  "age": 31,
  "special_info": "Special request: black"
}


In [56]:
#fin