{ "cells": [ { "cell_type": "markdown", "id": "07c45c6d-d3a2-44c7-8e14-7e57a05e80b6", "metadata": {}, "source": [ "# Chatbot with Conversation History" ] }, { "cell_type": "markdown", "id": "aef74060", "metadata": {}, "source": [ "*Copyright (c) Meta Platforms, Inc. and affiliates.\n", "This software may be used and distributed according to the terms of the Llama Community License Agreement.*" ] }, { "cell_type": "markdown", "id": "ee08f1d9", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "ede00bda-8bfe-450a-9b06-7b0caa4752f8", "metadata": {}, "source": [ "This tutorial shows you how to build a chatbot with conversation history. Using Llama 4, we will create a conversational agent that takes a URL, understands its content, and allows you have an interactive conversation with it, while maintaining conversation history.\n", "\n", "| Component | Choice | Why |\n", "| :----------------- | :----------------------------------------- | :-------------------- |\n", "| **Model** | `Llama-4-Maverick-17B-128E-Instruct-FP8` | A powerful Mixture-of-Experts (MoE) model ideal for complex instruction-following. Llama 4 Maverick offers superior performance and a massive context window (up to 1M tokens). |\n", "| **Pattern** | In-context learning + sliding window memory | We will pass the entire webpage content directly into the model's context. Llama 4's large context window makes this simple approach viable for even very large pages, often removing the need for a complex RAG system. | \n", "| **Infrastructure** | Meta's official [Llama API](https://llama.developer.meta.com/) | Provides serverless, production-ready access to Llama 4 models using the `llama_api_client` SDK. |\n", "---\n", "\n", "**Note on Inference Providers:** This tutorial uses the Llama API for demonstration purposes. However, you can run Llama 4 models with any preferred inference provider. Common examples include [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) and [Together AI](https://together.ai/llama). The core logic of this tutorial can be adapted to any of these providers.\n", "\n", "## What you will learn\n", "\n", "- **The fundamentals of chat completion:** How to structure conversations using system, user, and assistant roles.\n", "- **How to manage conversation history:** Implement a sliding window to maintain context in long conversations without exceeding token limits.\n", "- **Practical prompt engineering:** How to guide the model to answer questions based *only* on provided text.\n", "- **How to perform meta-tasks:** Leverage the model to summarize the conversation history." ] }, { "cell_type": "markdown", "id": "f23c1096-c3d8-45b4-99cc-ecc741ae7107", "metadata": {}, "source": [ "## Install dependencies\n", "\n", "You will need a few libraries for this project: `requests` to download webpages, `readability-lxml` to extract the core content, `markdownify` to convert HTML to clean Markdown, `tiktoken` for accurate token counting, and the official `llama-api-client`." ] }, { "cell_type": "code", "execution_count": 22, "id": "33159f01-510a-4196-b438-a015e4e4e4b5", "metadata": {}, "outputs": [], "source": [ "!uv pip install --quiet requests beautifulsoup4 readability-lxml markdownify tiktoken llama-api-client" ] }, { "cell_type": "markdown", "id": "40362350-96d3-429c-9b1d-e3da54889f4c", "metadata": {}, "source": [ "## Imports & Llama API client setup\n", "\n", "In this tutorial, we will use [Llama API](https://llama.developer.meta.com/) as the inference provider. So, you would first need to get an API key from Llama API if you don't have one already. Then set the Llama API key as an environment variable, such as `LLAMA_API_KEY`, as shown in the example.\n", "\n", "Remember, you can adapt this section to use your preferred inference provider." ] }, { "cell_type": "code", "execution_count": 26, "id": "a1d0e9b5-6d93-4cb1-bca5-f4eba103197d", "metadata": {}, "outputs": [], "source": [ "import os, sys, re, html, textwrap\n", "import requests\n", "from typing import List, Dict\n", "from bs4 import BeautifulSoup\n", "import tiktoken\n", "from readability import Document\n", "from markdownify import markdownify\n", "from llama_api_client import LlamaAPIClient" ] }, { "cell_type": "code", "execution_count": 7, "id": "8853bb9a-6fe4-445f-951d-ddc4e63d9f8e", "metadata": {}, "outputs": [], "source": [ "# --- Llama client ---\n", "API_KEY = os.getenv(\"LLAMA_API_KEY\")\n", "if not API_KEY:\n", " sys.exit(\"❌ Please set the LLAMA_API_KEY environment variable.\")\n", "\n", "client = LlamaAPIClient(api_key=API_KEY)" ] }, { "cell_type": "markdown", "id": "24f0e401-768e-4f5f-965c-acc72837aa0e", "metadata": {}, "source": [ "## Fetch and clean a webpage\n", "\n", "To get high-quality responses from the model, you first need to provide it with high-quality data. Raw HTML contains a lot of \"noise\" (like navigation bars, ads, and scripts) that can distract the model. The following function implements a three-step process to transform a messy webpage into clean, structured Markdown that is ideal for the LLM.\n", "\n", "1. **Extract Core Content:** It uses the `readability` library to pull out the main body of the article, discarding common boilerplate like headers, footers, and sidebars.\n", "2. **Final Cleanup:** It uses `BeautifulSoup` to remove any remaining `