|  | hace 10 meses | |
|---|---|---|
| .. | ||
| 1.png | hace 10 meses | |
| 2.png | hace 10 meses | |
| 3.png | hace 10 meses | |
| README.md | hace 10 meses | |
| examples_log.txt | hace 10 meses | |
| functions_prompt.py | hace 10 meses | |
| gmagent.py | hace 10 meses | |
| main.py | hace 10 meses | |
| requirements.txt | hace 10 meses | |
This Gmagent app shows how to build a Gmail agent app powered by Llama 3.1 8B running locally via Ollama (for privacy concern since Gamgent is about your Gmail). We'll start with building from scratch a basic agent with custom tool calling natively supported in Llama 3.1. The end goal is to cover all components of a production-ready agent app, acting as an assistant to your Gmail, with great user experience: intuitive, engaging, efficient and reliable.
Currently implemented features of Gmagent include:
Email is an essential and one top killer app people use every day. A recent State of AI Agents survey by LangChain finds that "The top use cases for agents include performing research and summarization (58%), followed by streamlining tasks for personal productivity or assistance (53.5%)."
Andrew Ng wrote a 5-part Agentic Design Patterns in March 2024 predicting "AI agent workflows will drive massive AI progress this year".
Deloitte published in November 2024 a report on AI agents and multiagent systems stating that "Through their ability to reason, plan, remember and act, AI agents address key limitations of typical language models." and "Executive leaders should make moves now to prepare for and embrace this next era of intelligent organizational transformation."
In the Thanksgiving week, a new startup /dev/agent building the next-gen OS for AI agents was in the spotlight.
So what exactly is an AI agent and how to start building an agent app?
The concept of agent is not new - in the 2010 3rd edition of Russell and Norvig's classic book Artificial Intelligence: A Modern Approach ("Modern" by 2010, two years before the deep learning revolution that started the truly modern AI), an agent is defined as "anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators". These days, AI agent basically means LLM-powered agent - well, if we treat natural language understanding as a type of sensor, LLM agent is still a sub-category of the traditional agent.
Lilian Weng in her popular June 2023 blog LLM Powered Autonomous Agents defines LLM-powered agent system to have four key components:
Andrew Ng describes four agentic design patterns as:
In Deloitte's report, AI agents are reasoning engines that can understand context, plan workflows, connect to external tools and data, and execute actions to achieve a defined goal.
In a November 2024 blog by Letta The AI agents stack, LLM powered agent is described as the combination of tools use, autonomous execution, and memory.
In addition, Harrison Chase defines agent in the blog What is an AI agent as "a system that uses an LLM to decide the control flow of an application."
Yet another simple summary by Felicis of what an agent does is that an agent expands LLMs to go from chat to act: an agent can pair LLMs with external data, multi-step reasoning and planning, and act on the user's behalf.
All in all (see Resources for even more info), agents are systems that take a high-level task, use an LLM as a reasoning and planning engine, with the help of contextual info and long-term memory if needed, to decide what actions to take, reflect and improve on the actions, and eventually execute those actions to accomplish the task.
It's time to see an agent app in action and enjoy some coding. Below is a preview of the questions or requests one may ask Gmagent:
Here is an example interaction log with Gmagent, with some screenshots of the interaction below (the user inputs and Gmagent outputs are after Your ask: and Gmagent: ; what's between ---- are Llama outputs and tool calling results):
If you feel intimated by the steps of the following Enable Gmail API section, you may want to check again the example asks (to see what you can ask to the agent) and the example log (to see the whole conversation with gmagent) - the devil's in the detail and all the glorious description of a powerful trendy agent may not mention the little details one has to deal with to build it.
Download Ollama (available for macOS, Linux, and Windows) here. Then download and test run the Llama 3.1 8B model by running on a Terminal:
ollama run llama3.1
This will download a quantized version of Llama 3.1 of the size 4.7GB.
First, create a Conda or virtual env:
conda create -n gmagent python=3.10
conda activate gmagent
or
python  -m venv gmagent
source gmagent/bin/activate # on Linux, macOS:
source gmagent\Scripts\activate # on Windows
Then install the required Python libraries:
git clone https://github.com/jeffxtang/gmagent
cd gmagent
pip install -r requirements.txt
To run Gmagent, you need to first copy the credentials.json file downloaded and renamed above in Step 6 of Enable Gmail API to the gmagent folder, then run:
python main.py --user_email <your_gmail_address>
The first time you run it, you'll get a prompt like this;
Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=xxxx
Enter the authorization code: 
You need to copy the URL above and open it in a browser - if you Sign in with Google using the same Gmail you enabled for the Gmail API, then you'll see "You’ve been given access to an app that’s currently being tested. You should only continue if you know the developer that invited you.", otherwise if you sign in with another Gmail, you'll see "Gmail Agent App has not completed the Google verification process. The app is currently being tested, and can only be accessed by developer-approved testers. If you think you should have access, contact the developer."
In the latter case, go to APIs & Services > OAuth consent screen > Test users, and click the + ADD USERS button, and you'll see this message: While publishing status is set to "Testing", only test users are able to access the app. Allowed user cap prior to app verification is 100, and is counted over the entire lifetime of the app.
After clicking Continue, check the Select all checkbox to enable both settings required for running Gmagent:
View your email messages and settings. 
Manage drafts and send emails.
Finally, copy the Authorization code and paste it to the Terminal, hit Enter and you'll see Gmagent's initial greeting (which will likely differ because the default temperature value 0.8 is used here - see Ollama's model file for detail) such as:
Hello! I'm Gmagent, here to help you manage your Gmail account with ease.
What would you like to do today? Do you want me to:
Check and respond to new emails
Compose a new email
Organize your inbox with filters or labels
Delete unwanted emails
Something else?
Let me know how I can assist you!
Your ask:
If you cancel here and run the command python main.py --user_email <your_gmail_address> again you should see the Gmagent greeting right away without the need to enter an authorization code, unless you enter a different Gmail address for the first time - in fact, for each authorized (added as a test user) Gmail address, a file token_xxxx@gmail.com.pickle will be created which contains the authorized token.
See the example asks and interaction log above for the types of asks you may enter.
Notes here mainly cover how custom functions are defined, how Gmail API based functions are implemented, and how an Agent class is defined to handle memory for contextual chat and perform pre- and post-processing on the tool calling.
The functions_prompt.py defines the following six custom functions, as part of the system prompt (along with examples for each function call spec that Llama should return):
Below is an example function call spec in JSON format, for the user asks such as "do i have emails with attachments larger than 5mb", "any attachments larger than 5mb" or "let me know if i have large attachments over 5mb":
{"name": "list_emails", "parameters": {"query": "has:attachment larger:5mb"}}
Before LLMs, it'd be a REAL pain to cover ALL the possible user natural language inputs that can be and should be all translated into the same semantic representation (if you've done Amazon Alex Skill or Google Assistant development or any pre-LLM NLU work before, you'd know that the JSON format is the same as intent-slots representation). Now LLMs such as Llama do the most heavy lifting in translating a natural language open input into its semantic representation.
But still, if you look at how the list_emails_function (which is used to search for emails based on a user query) is defined below, you'd see a lot of work would be needed to convert the user's asks to the filter values the Gmail API can accept:
list_emails_function = """
{
    "type": "function",
    "function": {
        "name": "list_emails",
        "description": "Return a list of emails matching an optionally specified query.",
        "parameters": {
            "type": "dic",
            "properties": [
                {
                    "maxResults": {
                        "type": "integer",
                        "description": "The default maximum number of emails to return is 100; the maximum allowed value for this field is 500."
                    }
                },              
                {
                    "query": {
                        "type": "string",
                        "description": "One or more keywords in the email subject and body, or one or more filters. There can be 6 types of filters: 1) Field-specific Filters: from, to, cc, bcc, subject; 2) Date Filters: before, after, older than, newer than); 3) Status Filters: read, unread, starred, importatant; 4) Attachment Filters: has, filename or type; 5) Size Filters: larger, smaller; 6) logical operators (or, and, not)."
                    }
                }
            ],
            "required": []
        }
    }
}
"""
In fact, even though many hours of pre-processing work has been done to cover some test examples, not all of the examples in functions_prompt.py,have been covered and tested.
For each defined custom function call, its implementation using the Gmail API is present in gmagent.py. For example, the list_emails is defined as follows:
def list_emails(query='', max_results=100):
    emails = []
    next_page_token = None
    while True:
        response = service.users().messages().list(
            userId=user_id,
            maxResults=max_results,
            pageToken=next_page_token,
            q=query
        ).execute()
        
        if 'messages' in response:
            for msg in response['messages']:
                sender, subject, received_time = get_email_info(msg['id'])
                emails.append(
                    {
                        "message_id": msg['id'],
                        "sender": sender,
                        "subject": subject,
                        "received_time": received_time
                    }
                )
        
        next_page_token = response.get('nextPageToken')
        if not next_page_token:
            break
    
    return emails
The function will be called by our agent after a user ask such as "do i have emails with attachments larger than 5mb" gets Llama's response below:
{"name": "list_emails", "parameters": {"query": "has:attachment larger:5mb"}}
Implemented also in gmagent.py, the Agent class uses 3 instance members to allow for contextual aware asks to Gmagent, making it have short-term memory:
messages: this list holds all the previous user asks and the function call results based on Llama's response to the user asks, making Llama able to answer follow-up questions such as "how about 5mb" (after initial ask "attachments larger than 10mb") or "how about from yyy@gmail.com" (after ask "any emails from xxx@gmail.com).emails: this list holds a list of emails that matches the user query, so follow-up questions such as "what kind of attachments for the email with subject xxx" can be answered.draft_id: this is used to handle the ask "send the draft" after an initial ask such as "draft an email to xxx".The __call__ method of Agent includes the call to Llama with the messages and parses the Llama response if it's a tool calling spec JSON result, or if Llama doesn't return a tool calling spec, it means it doesn't find a custom tool for the user ask so the Llama response is returned directly:
    try:
      res = json.loads(result.split("<|python_tag|>")[-1])
      function_name = res['name']
      parameters = res['parameters']
      return {"function_name": function_name,
              "parameters": parameters}
    except:
      return result
Also implemented there are both pre-processing logic, mainly to convert some parameter values from Llama's responses to what Gmail APIs can accept to make the API calls happy, and post-processing logic to convert function call results to user-friendly natural language.
function_name = result["function_name"]
func = globals()[function_name]
parameters = result["parameters"]
... <pre-processing>
result = func(**parameters)
... <post-processing>
When you try out Gmagent, you'll likely find that further pre- and post-processing still needed to make it production ready. In a great video on Vertical LLM Agents, Jake Heller said "after passes frankly even like 100 tests the odds that it will do on any random distribution of user inputs of the next 100,000, 100% accurately is very high" and "by the time you've dealt with like all the edge cases... there might be dozens of things you build into your application to actually make it work well and then you get to the prompting piece and writing out tests and very specific prompts and the strategy for how you break down a big problem into step by step by step thinking and how you feed in the information how you format that information the right way". That's what all the business logic is about. We'll cover decomposing a complicated ask and multi-step reasoning in a future version of Gmagent, and continue to explore the best possible way to streamline the pre- and post-processing.
When running Gmagent, the detailed Llama returns, pre-processed tool call specs and the actual tool calling results are inside the ------------------------- block, e.g.:
Calling Llama...
Llama returned: {'function_name': 'list_emails', 'parameters': {'query': 'subject:papers to read has:attachment'}}.
Calling tool to access Gmail API: list_emails, {'query': 'subject:papers to read has:attachment'}...
Tool calling returned: [{'message_id': '1936ef72ad3f30e8', 'sender': 'gmagent_tester1@gmail.com', 'subject': 'Fwd: papers to read', 'received_time': '2024-11-27 10:51:51 PST'}, {'message_id': '1936b819706a4923', 'sender': 'Jeff Tang ', 'subject': 'papers to read', 'received_time': '2024-11-26 18:44:19 PST'}]