{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "FOs91Ure71js" }, "source": [ "# Open Contextual RAG\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/togethercomputer/together-cookbook/blob/main/Open_Contextual_RAG.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "uvlBTGL58Cd3" }, "source": [ "## Introduction\n", "\n", "[Contextual Retrieval](https://www.anthropic.com/news/contextual-retrieval) is a chunk augmentation technique that uses a LLM to ehance each chunk.\n", "\n", "\n", "\n", "Here's an overview of how it works. 👇\n", "\n", "Contextual RAG:\n", "\n", "1. For every chunk - prepend an explanatory context snippet that situates the chunk within the rest\n", " of the document. -> Get a small cost effective LLM to do this.\n", "\n", "2. Hybrid Search: Embed the chunk using both sparse (keyword) and dense(semantic) embeddings.\n", "\n", "3. Perform rank fusion using an algorithm like Reciprocal Rank Fusion(RRF).\n", "\n", "4. Retrieve top 150 chunks and pass those to a Reranker to obtain top 20 chunks.\n", "\n", "5. Pass top 20 chunks to LLM to generate an answer.\n", "\n", "Below we implement each step in this process using Open Source models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To breakdown the concept further we break down the process into a one-time indexing step and a query time step.\n", "\n", "Data Ingestion Phase:\n", "\n", "\n", "\n", "1. Data processing and chunking\n", "2. Context generation using a quantized Llama 3B Model\n", "3. Vector Embedding and Index Generation\n", "4. BM25 Keyword Index Generation\n", "\n", "\n", "At Query Time:\n", "\n", "\n", "\n", "1. Perform retreival using both indices and combine them using RRF\n", "2. Reranker to improve retreival quality\n", "3. Generation with Llama 405b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install Libraries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1LkLFXDet80v", "outputId": "f9caad45-918f-4c64-ac72-0465df381848" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: pypdf in /usr/local/lib/python3.10/dist-packages (5.0.1)\n", "Requirement already satisfied: typing_extensions>=4.0 in /usr/local/lib/python3.10/dist-packages (from pypdf) (4.12.2)\n", "Requirement already satisfied: together in /usr/local/lib/python3.10/dist-packages (1.3.3)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.9.3 in /usr/local/lib/python3.10/dist-packages (from together) (3.10.10)\n", "Requirement already satisfied: click<9.0.0,>=8.1.7 in /usr/local/lib/python3.10/dist-packages (from together) (8.1.7)\n", "Requirement already satisfied: eval-type-backport<0.3.0,>=0.1.3 in /usr/local/lib/python3.10/dist-packages (from together) (0.2.0)\n", "Requirement already satisfied: filelock<4.0.0,>=3.13.1 in /usr/local/lib/python3.10/dist-packages (from together) (3.16.1)\n", "Requirement already satisfied: numpy>=1.23.5 in /usr/local/lib/python3.10/dist-packages (from together) (1.26.4)\n", "Requirement already satisfied: pillow<11.0.0,>=10.3.0 in /usr/local/lib/python3.10/dist-packages (from together) (10.4.0)\n", "Requirement already satisfied: pyarrow>=10.0.1 in /usr/local/lib/python3.10/dist-packages (from together) (16.1.0)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.6.3 in /usr/local/lib/python3.10/dist-packages (from together) (2.9.2)\n", "Requirement already satisfied: requests<3.0.0,>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from together) (2.32.3)\n", "Requirement already satisfied: rich<14.0.0,>=13.8.1 in /usr/local/lib/python3.10/dist-packages (from together) (13.9.3)\n", "Requirement already satisfied: tabulate<0.10.0,>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from together) (0.9.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.66.2 in /usr/local/lib/python3.10/dist-packages (from together) (4.66.5)\n", "Requirement already satisfied: typer<0.13,>=0.9 in /usr/local/lib/python3.10/dist-packages (from together) (0.12.5)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (2.4.3)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (24.2.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (1.4.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (6.1.0)\n", "Requirement already satisfied: yarl<2.0,>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (1.16.0)\n", "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.9.3->together) (4.0.3)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.6.3->together) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.6.3->together) (2.23.4)\n", "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.6.3->together) (4.12.2)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (3.4.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (3.10)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (2.2.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.31.0->together) (2024.8.30)\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich<14.0.0,>=13.8.1->together) (3.0.0)\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich<14.0.0,>=13.8.1->together) (2.18.0)\n", "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from typer<0.13,>=0.9->together) (1.5.4)\n", "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich<14.0.0,>=13.8.1->together) (0.1.2)\n", "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from yarl<2.0,>=1.12.0->aiohttp<4.0.0,>=3.9.3->together) (0.2.0)\n", "Requirement already satisfied: Wikipedia-API in /usr/local/lib/python3.10/dist-packages (0.7.1)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from Wikipedia-API) (2.32.3)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->Wikipedia-API) (3.4.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->Wikipedia-API) (3.10)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->Wikipedia-API) (2.2.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->Wikipedia-API) (2024.8.30)\n", "Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (0.8.0)\n", "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken) (2024.9.11)\n", "Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/dist-packages (from tiktoken) (2.32.3)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken) (3.4.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken) (3.10)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken) (2.2.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken) (2024.8.30)\n", "Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (4.12.3)\n", "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4) (2.6)\n" ] } ], "source": [ "!pip install together # To access open source LLMs\n", "!pip install --upgrade tiktoken # To count total token counts\n", "!pip install beautifulsoup4 # To scrape documents to RAG over\n", "!pip install bm25s # To implement out key-word BM25 search" ] }, { "cell_type": "markdown", "metadata": { "id": "H2kyKpp037zU" }, "source": [ "### 1. Data Processing and Chunking\n", "\n", "We will RAG over Paul Grahams latest essay titled [**Founder Mode**](https://paulgraham.com/foundermode.html)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "FA6zEEIFEOaT" }, "outputs": [], "source": [ "# Let's download the essay from Paul Graham's website\n", "\n", "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def scrape_pg_essay():\n", "\n", " url = 'https://paulgraham.com/foundermode.html'\n", "\n", " try:\n", " # Send GET request to the URL\n", " response = requests.get(url)\n", " response.raise_for_status() # Raise an error for bad status codes\n", "\n", " # Parse the HTML content\n", " soup = BeautifulSoup(response.text, 'html.parser')\n", "\n", " # Paul Graham's essays typically have the main content in a font tag\n", " # You might need to adjust this selector based on the actual HTML structure\n", " content = soup.find('font')\n", "\n", " if content:\n", " # Extract and clean the text\n", " text = content.get_text()\n", " # Remove extra whitespace and normalize line breaks\n", " text = ' '.join(text.split())\n", " return text\n", " else:\n", " return \"Could not find the main content of the essay.\"\n", "\n", " except requests.RequestException as e:\n", " return f\"Error fetching the webpage: {e}\"\n", "\n", "# Scrape the essay\n", "pg_essay = scrape_pg_essay()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vSHFejIIEUC0", "outputId": "f9a87bda-4237-410c-ff4a-0f5e42c1864b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "September 2024At a YC event last week Brian Chesky gave a talk that everyone who was there will remember. Most founders I talked to afterward said it was the best they'd ever heard. Ron Conway, for the first time in his life, forgot to take notes. I'm not going to try to reproduce it here. Instead I want to talk about a question it raised.The theme of Brian's talk was that the conventional wisdom about how to run larger companies is mistaken. As Airbnb grew, well-meaning people advised him that he had to run the company in a certain way for it to scale. Their advice could be optimistically summarized as \"hire good people and give them room to do their jobs.\" He followed this advice and the results were disastrous. So he had to figure out a better way on his own, which he did partly by studying how Steve Jobs ran Apple. So far it seems to be working. Airbnb's free cash flow margin is now among the best in Silicon Valley.The audience at this event included a lot of the most successful fo\n" ] } ], "source": [ "print(pg_essay[:1000])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "thecu6QAzt_d" }, "outputs": [], "source": [ "# We can get away with naive fixed sized chunking as the context generation will add meaning to these chunks\n", "\n", "def create_chunks(document, chunk_size=300, overlap=50):\n", " return [document[i : i + chunk_size] for i in range(0, len(document), chunk_size - overlap)]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KzWDTf-SGAMv", "outputId": "8e114fe2-e2b4-492b-cd10-7101474e6d5d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Chunk 1: September 2024At a YC event last week Brian Chesky gave a talk that everyone who was there will remember. Most founders I talked to afterward said it was the best they'd ever heard. Ron Conway, for the first time in his life, forgot to take notes. I'\n", "Chunk 2: life, forgot to take notes. I'm not going to try to reproduce it here. Instead I want to talk about a question it raised.The theme of Brian's talk was that the conventional wisdom about how to run larger companies is mistaken. As Airbnb grew, well-me\n", "Chunk 3: taken. As Airbnb grew, well-meaning people advised him that he had to run the company in a certain way for it to scale. Their advice could be optimistically summarized as \"hire good people and give them room to do their jobs.\" He followed this advice\n", "Chunk 4: jobs.\" He followed this advice and the results were disastrous. So he had to figure out a better way on his own, which he did partly by studying how Steve Jobs ran Apple. So far it seems to be working. Airbnb's free cash flow margin is now among the \n", "Chunk 5: flow margin is now among the best in Silicon Valley.The audience at this event included a lot of the most successful founders we've funded, and one after another said that the same thing had happened to them. They'd been given the same advice about \n", "Chunk 6: n given the same advice about how to run their companies as they grew, but instead of helping their companies, it had damaged them.Why was everyone telling these founders the wrong thing? That was the big mystery to me. And after mulling it over for \n", "Chunk 7: And after mulling it over for a bit I figured out the answer: what they were being told was how to run a company you hadn't founded — how to run a company if you're merely a professional manager. But this m.o. is so much less effective that to founde\n", "Chunk 8: less effective that to founders it feels broken. There are things founders can do that managers can't, and not doing them feels wrong to founders, because it is.In effect there are two different ways to run a company: founder mode and manager mode. \n", "Chunk 9: ounder mode and manager mode. Till now most people even in Silicon Valley have implicitly assumed that scaling a startup meant switching to manager mode. But we can infer the existence of another mode from the dismay of founders who've tried it, and \n", "Chunk 10: founders who've tried it, and the success of their attempts to escape from it.There are as far as I know no books specifically about founder mode. Business schools don't know it exists. All we have so far are the experiments of individual founders wh\n", "Chunk 11: ents of individual founders who've been figuring it out for themselves. But now that we know what we're looking for, we can search for it. I hope in a few years founder mode will be as well understood as manager mode. We can already guess at some of \n", "Chunk 12: can already guess at some of the ways it will differ.The way managers are taught to run companies seems to be like modular design in the sense that you treat subtrees of the org chart as black boxes. You tell your direct reports what to do, and it's\n", "Chunk 13: t reports what to do, and it's up to them to figure out how. But you don't get involved in the details of what they do. That would be micromanaging them, which is bad.Hire good people and give them room to do their jobs. Sounds great when it's descri\n", "Chunk 14: Sounds great when it's described that way, doesn't it? Except in practice, judging from the report of founder after founder, what this often turns out to mean is: hire professional fakers and let them drive the company into the ground.One theme I no\n", "Chunk 15: into the ground.One theme I noticed both in Brian's talk and when talking to founders afterward was the idea of being gaslit. Founders feel like they're being gaslit from both sides — by the people telling them they have to run their companies like m\n", "Chunk 16: to run their companies like managers, and by the people working for them when they do. Usually when everyone around you disagrees with you, your default assumption should be that you're mistaken. But this is one of the rare exceptions. VCs who haven\n", "Chunk 17: rare exceptions. VCs who haven't been founders themselves don't know how founders should run companies, and C-level execs, as a class, include some of the most skillful liars in the world. [1]Whatever founder mode consists of, it's pretty clear that \n", "Chunk 18: ts of, it's pretty clear that it's going to break the principle that the CEO should engage with the company only via his or her direct reports. \"Skip-level\" meetings will become the norm instead of a practice so unusual that there's a name for it. An\n", "Chunk 19: that there's a name for it. And once you abandon that constraint there are a huge number of permutations to choose from.For example, Steve Jobs used to run an annual retreat for what he considered the 100 most important people at Apple, and these wer\n", "Chunk 20: people at Apple, and these were not the 100 people highest on the org chart. Can you imagine the force of will it would take to do this at the average company? And yet imagine how useful such a thing could be. It could make a big company feel like a \n", "Chunk 21: ake a big company feel like a startup. Steve presumably wouldn't have kept having these retreats if they didn't work. But I've never heard of another company doing this. So is it a good idea, or a bad one? We still don't know. That's how little we kn\n", "Chunk 22: know. That's how little we know about founder mode. [2]Obviously founders can't keep running a 2000 person company the way they ran it when it had 20. There's going to have to be some amount of delegation. Where the borders of autonomy end up, and h\n", "Chunk 23: ders of autonomy end up, and how sharp they are, will probably vary from company to company. They'll even vary from time to time within the same company, as managers earn trust. So founder mode will be more complicated than manager mode. But it will \n", "Chunk 24: han manager mode. But it will also work better. We already know that from the examples of individual founders groping their way toward it.Indeed, another prediction I'll make about founder mode is that once we figure out what it is, we'll find that a\n", "Chunk 25: what it is, we'll find that a number of individual founders were already most of the way there — except that in doing what they did they were regarded by many as eccentric or worse. [3]Curiously enough it's an encouraging thought that we still know \n", "Chunk 26: ng thought that we still know so little about founder mode. Look at what founders have achieved already, and yet they've achieved this against a headwind of bad advice. Imagine what they'll do once we can tell them how to run their companies like Ste\n", "Chunk 27: o run their companies like Steve Jobs instead of John Sculley.Notes[1] The more diplomatic way of phrasing this statement would be to say that experienced C-level execs are often very skilled at managing up. And I don't think anyone with knowledge of\n", "Chunk 28: think anyone with knowledge of this world would dispute that.[2] If the practice of having such retreats became so widespread that even mature companies dominated by politics started to do it, we could quantify the senescence of companies by the aver\n", "Chunk 29: cence of companies by the average depth on the org chart of those invited.[3] I also have another less optimistic prediction: as soon as the concept of founder mode becomes established, people will start misusing it. Founders who are unable to delega\n", "Chunk 30: nders who are unable to delegate even things they should will use founder mode as the excuse. Or managers who aren't founders will decide they should try to act like founders. That may even work, to some extent, but the results will be messy when it \n", "Chunk 31: results will be messy when it doesn't; the modular approach does at least limit the damage a bad CEO can do.Thanks to Brian Chesky, Patrick Collison, Ron Conway, Jessica Livingston, Elon Musk, Ryan Petersen, Harj Taggar, and Garry Tan for reading dra\n", "Chunk 32: and Garry Tan for reading drafts of this.\n" ] } ], "source": [ "chunks = create_chunks(pg_essay, chunk_size=250, overlap=30)\n", "\n", "for i, chunk in enumerate(chunks):\n", " print(f\"Chunk {i + 1}: {chunk}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "yafrkIP-4I3p" }, "source": [ "### 2. Generating Contextual Chunks\n", "\n", "This part contains the main intuition behind `Contextual Retrieval`. We will make an LLM call for each chunk to add much needed relevant context to the chunk. In order to do this we pass in the ENTIRE document per LLM call.\n", "\n", "It may seem that passing in the entire document per chunk and making an LLM call per chunk is quite inefficient, this is true and there very well might be more efficient techniques to accomplish the same end goal. But in keeping with implementing the current technique at hand lets do it.\n", "\n", "Additionally using quantized small 1-3b models (here we will use Llama 3b) along with prompt caching does make this more feasible.\n", "\n", "Prompt caching allows key and value matrices corresponding to the document to be cached for future LLM calls." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "vNN7oQg_Ppoc" }, "outputs": [], "source": [ "import together, os\n", "from together import Together\n", "\n", "# Paste in your Together AI API Key or load it\n", "TOGETHER_API_KEY = os.environ.get(\"TOGETHER_API_KEY\")\n", "client = Together(api_key = TOGETHER_API_KEY)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "_9TXZMBQM5i5" }, "outputs": [], "source": [ "# We want to generate a snippet explaining the relevance/importance of the chunk with\n", "# full document in mind.\n", "\n", "CONTEXTUAL_RAG_PROMPT = \"\"\"\n", "Given the document below, we want to explain what the chunk captures in the document.\n", "\n", "{WHOLE_DOCUMENT}\n", "\n", "Here is the chunk we want to explain:\n", "\n", "{CHUNK_CONTENT}\n", "\n", "Answer ONLY with a succinct explaination of the meaning of the chunk in the context of the whole document above.\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "pqPzVWDFQM4R" }, "outputs": [], "source": [ "from typing import List\n", "\n", "# First we will just generate the prompts and examine them\n", "\n", "def generate_prompts(document : str, chunks : List[str]) -> List[str]:\n", " prompts = []\n", " for chunk in chunks:\n", " prompt = CONTEXTUAL_RAG_PROMPT.format(WHOLE_DOCUMENT=document, CHUNK_CONTENT=chunk)\n", " prompts.append(prompt)\n", " return prompts" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "ZiPQsasRQhmj" }, "outputs": [], "source": [ "prompts = generate_prompts(pg_essay, chunks)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zuX07sLaGw8k", "outputId": "81f70662-1f9d-4408-eba0-8a1c2bad4e94" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " even things they should will use founder mode as the excuse. Or managers who aren't founders will decide they should try to act like founders. That may even work, to some extent, but the results will be messy when it doesn't; the modular approach does at least limit the damage a bad CEO can do.Thanks to Brian Chesky, Patrick Collison, Ron Conway, Jessica Livingston, Elon Musk, Ryan Petersen, Harj Taggar, and Garry Tan for reading drafts of this.\n", "\n", "Here is the chunk we want to explain:\n", "\n", "September 2024At a YC event last week Brian Chesky gave a talk that everyone who was there will remember. Most founders I talked to afterward said it was the best they'd ever heard. Ron Conway, for the first time in his life, forgot to take notes. I'\n", "\n", "Answer ONLY with a succinct explaination of the meaning of the chunk in the context of the whole document above.\n", "\n" ] } ], "source": [ "print(prompts[0][6500:])" ] }, { "cell_type": "markdown", "metadata": { "id": "I7PoEhPpeiuQ" }, "source": [ "#### How much does generating one Contextual chunk cost?\n", "\n", "Below we'll look at total tokens per prompt and estimate the cost of generating one/many contextual chunks." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "kfgROm3zRGmC" }, "outputs": [], "source": [ "# token counter function\n", "import tiktoken\n", "\n", "def num_tokens_from_string(string: str, encoding_name: str) -> int:\n", " \"\"\"Returns the number of tokens in a text string.\"\"\"\n", " encoding = tiktoken.get_encoding(encoding_name)\n", " num_tokens = len(encoding.encode(string))\n", " return num_tokens" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "yq3h1FJQSBtc", "outputId": "72209e79-f94d-4896-a2ec-ca5b1433f109" }, "outputs": [ { "data": { "text/plain": [ "1457" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We'll assume we're using the \"cl100k_base\" encoding\n", "\n", "num_tokens_from_string(pg_essay, \"cl100k_base\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4tuiVk_mRK5m", "outputId": "d9af0101-5971-4655-9570-f2a423f659a5" }, "outputs": [ { "data": { "text/plain": [ "1565" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "num_tokens_from_string(prompts[0], \"cl100k_base\")" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YcvmEoKde2Ug", "outputId": "53bca5da-c52b-4490-8b6b-32ecf7d44426" }, "outputs": [ { "data": { "text/plain": [ "1559.21875" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#import python mean function\n", "from statistics import mean\n", "\n", "mean([num_tokens_from_string(prompt, \"cl100k_base\") for prompt in prompts])" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dDIPvz7_jfJi", "outputId": "1c46bc06-0cc1-40cc-f518-af44bc4705b2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cost to generate contextual chunks = $0.0031872\n", "Number of Chunks Processed with $1.00 = 10040.160642570281 chunks\n" ] } ], "source": [ "print(f\"Cost to generate contextual chunks = ${32*0.06*(1660/1000000)}\")\n", "print(f\"Number of Chunks Processed with $1.00 = {1000000/(0.06*1660)} chunks\")" ] }, { "cell_type": "markdown", "metadata": { "id": "PbL2i6LEiy0j" }, "source": [ "`$0.06` per 1 million tokens for Llama 3b.\n", "\n", "Assuming input lenght of ~ 1560 tokens and output length of 100 tokens.\n", "\n", "Given an approximate token count of ~ 1660 per context generation we can generate 10,000 contexts for a $1.00.\n", "\n", "Even if we were using the full context length of the model(~130,000) for each context generation we could generate ~128 contexts for $1.00." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "id": "rjMxrvZWSPOe" }, "outputs": [], "source": [ "def generate_context(prompt: str):\n", " \"\"\"\n", " Generates a contextual response based on the given prompt using the specified language model.\n", " Args:\n", " prompt (str): The input prompt to generate a response for.\n", " Returns:\n", " str: The generated response content from the language model.\n", " \"\"\"\n", " response = client.chat.completions.create(\n", " model=\"meta-llama/Llama-3.2-3B-Instruct-Turbo\",\n", " messages=[{\"role\": \"user\", \"content\": prompt}],\n", " temperature=1\n", " )\n", " return response.choices[0].message.content" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ryCIVaY9OO-0", "outputId": "77b58705-f196-486b-aa6a-453eda8d8da8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This chunk introduces the topic of the article, discussing a recent talk by Brian Chesky at a Y Combinator (YC) event and the impressions and reactions of founders attending the event, setting the stage for exploring differing wisdom on how to run larger companies. \n", "\n", "September 2024At a YC event last week Brian Chesky gave a talk that everyone who was there will remember. Most founders I talked to afterward said it was the best they'd ever heard. Ron Conway, for the first time in his life, forgot to take notes. I'\n" ] } ], "source": [ "# Lets generate one context and concatenate it with its chunk to see what the output looks like\n", "context_0 = generate_context(prompts[0])\n", "print(context_0 + \" \\n\\n\" + chunks[0])" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "id": "rLtmJ384SdiS" }, "outputs": [], "source": [ "# Let's generate the entire list of contextual chunks\n", "\n", "contextual_chunks = [generate_context(prompts[i])+' '+chunks[i] for i in range(len(chunks))]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eOBw3ySwR49c", "outputId": "d9309f5c-b193-4247-dcab-81cf87ff8d8f" }, "outputs": [ { "data": { "text/plain": [ "(32, True)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We should have one contextual chunk for each original chunk\n", "len(contextual_chunks), len(contextual_chunks) == len(chunks)" ] }, { "cell_type": "markdown", "metadata": { "id": "j-DInBMf4QlM" }, "source": [ "### 3. Embedding Generation\n", "\n", "We will now use `bge-large-en-v1.5` to embed the augmented chunks above into a vector index." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "id": "vwhX_mkv4QZA" }, "outputs": [], "source": [ "from typing import List\n", "import together\n", "import numpy as np\n", "\n", "def generate_embeddings(input_texts: List[str], model_api_string: str) -> List[List[float]]:\n", " \"\"\"Generate embeddings from Together python library.\n", "\n", " Args:\n", " input_texts: a list of string input texts.\n", " model_api_string: str. An API string for a specific embedding model of your choice.\n", "\n", " Returns:\n", " embeddings_list: a list of embeddings. Each element corresponds to the each input text.\n", " \"\"\"\n", " outputs = client.embeddings.create(\n", " input=input_texts,\n", " model=model_api_string,\n", " )\n", " return [x.embedding for x in outputs.data]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "id": "cmUOrm1pH1qV" }, "outputs": [], "source": [ "contextual_embeddings = generate_embeddings(contextual_chunks, \"BAAI/bge-large-en-v1.5\")" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FvfRtzT8PDgK", "outputId": "20e04621-885e-4117-f1df-f66201db9e5e" }, "outputs": [ { "data": { "text/plain": [ "1024" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Each vector is 1024 dimensional\n", "\n", "len(contextual_embeddings[0])" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "id": "k01ylAlhPqDM" }, "outputs": [], "source": [ "# Generate the vector embeddings for the query\n", "query = \"What are 'skip-level' meetings?\"\n", "\n", "query_embedding = generate_embeddings([query], 'BAAI/bge-large-en-v1.5')[0]" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "id": "qbDSgwiRSKVt" }, "outputs": [], "source": [ "from sklearn.metrics.pairwise import cosine_similarity\n", "\n", "# Calculate cosine similarity between the query embedding and each movie embedding\n", "similarity_scores = cosine_similarity([query_embedding], contextual_embeddings)\n", "indices = np.argsort(-similarity_scores)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4w43cUWkUBHs", "outputId": "01d4f47c-4aea-44a9-9186-25eb11d065e9" }, "outputs": [ { "data": { "text/plain": [ "array([17, 18, 11, 26, 27])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_5_indices = indices[0][:5]\n", "top_5_indices" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "aCqPl68QSkB1", "outputId": "34ab4155-a724-4d7c-fd48-a843d390ff5a" }, "outputs": [ { "data": { "text/plain": [ "['This chunk discusses the shift in company management away from the \"manager mode\" that most companies follow, where CEOs engage with the company only through their direct reports, to \"founder mode\", where CEOs engage more directly with even higher-level employees and potentially skip over direct reports, potentially leading to \"skip-level\" meetings. ts of, it\\'s pretty clear that it\\'s going to break the principle that the CEO should engage with the company only via his or her direct reports. \"Skip-level\" meetings will become the norm instead of a practice so unusual that there\\'s a name for it. An',\n", " 'This chunk refers to \"skip-level\" meetings, which are a key characteristic of founder mode, where the CEO engages directly with the company beyond their direct reports. This contrasts with the \"manager mode\" of addressing company issues, where decisions are made perfunctorily via a hierarchical system, to which founders instinctively rebel. that there\\'s a name for it. And once you abandon that constraint there are a huge number of permutations to choose from.For example, Steve Jobs used to run an annual retreat for what he considered the 100 most important people at Apple, and these wer',\n", " 'This chunk explains that founder mode, a hypothetical approach to running a company by its founders, will differ from manager mode in that founders will engage directly with the company, rather than just their direct reports, through \"skip-level\" meetings, disregarding the traditional principle that CEOs should only interact with their direct reports, as managers do. can already guess at some of the ways it will differ.The way managers are taught to run companies seems to be like modular design in the sense that you treat subtrees of the org chart as black boxes. You tell your direct reports what to do, and it\\'s',\n", " 'This chunk refers to having the CEO engage directly with non-direct reports, particularly in informal, high-level meetings, in order to foster a sense of community and innovation, as exemplified by Steve Jobs, rather than following the conventional hierarchical structure of \"manager mode\" where only the CEO communicates with direct reports through their representatives. o run their companies like Steve Jobs instead of John Sculley.Notes[1] The more diplomatic way of phrasing this statement would be to say that experienced C-level execs are often very skilled at managing up. And I don\\'t think anyone with knowledge of',\n", " 'This chunk discusses how even experienced C-level executives (managers) are skilled at managing up and hiding their true intentions, which is why founding CEOs like Steve Jobs were able to get away with unconventional leadership strategies without facing immediate pushback from their management teams. think anyone with knowledge of this world would dispute that.[2] If the practice of having such retreats became so widespread that even mature companies dominated by politics started to do it, we could quantify the senescence of companies by the aver']" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_5_chunks = [contextual_chunks[index] for index in indices[0]][:5]\n", "\n", "top_5_chunks" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "id": "CMCJ1PlvUxiy" }, "outputs": [], "source": [ "def vector_retreival(query: str, top_k: int = 5, vector_index: np.ndarray = None) -> List[int]:\n", " \"\"\"\n", " Retrieve the top-k most similar items from an index based on a query.\n", " Args:\n", " query (str): The query string to search for.\n", " top_k (int, optional): The number of top similar items to retrieve. Defaults to 5.\n", " index (np.ndarray, optional): The index array containing embeddings to search against. Defaults to None.\n", " Returns:\n", " List[int]: A list of indices corresponding to the top-k most similar items in the index.\n", " \"\"\"\n", "\n", " query_embedding = generate_embeddings([query], 'BAAI/bge-large-en-v1.5')[0]\n", " similarity_scores = cosine_similarity([query_embedding], vector_index)\n", "\n", " return list(np.argsort(-similarity_scores)[0][:top_k])" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wvjGlof0VViw", "outputId": "f6d2ed8e-2313-4710-862a-8577fe4036c9" }, "outputs": [ { "data": { "text/plain": [ "[17, 18, 11, 26, 27]" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vector_retreival(query = \"What are 'skip-level' meetings?\", top_k = 5, vector_index = contextual_embeddings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a way to retrieve from the vector index given a query." ] }, { "cell_type": "markdown", "metadata": { "id": "8k9w5eKU0yqq" }, "source": [ "### 4. BM25 Search Using `bm25s` library\n", "\n", "Lets build a keyword index that allows us to use BM25 to perform lexical search based on the words present in the query and the contextual chunks." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35, "referenced_widgets": [ "c370b066ae6c4cc2bea8a791eb52af11", "7f5e14cf3f444a2c8e94885397260155", "8815c8b281d04e59bad3e06b712a33fe", "fedd86db037240f392b9ed5abda461b7", "3d00c22145464e4f8fd5ef6c20e3d254", "141fb9a69bf94afd9d5692409cc261cd", "a6a389812ad94b0a80c222f87de469e4", "ce1c2052b9a04a5da2f0cc74c8375e89", "c60c9131c66e44a7aa6c24c0c360eb7b", "6115e78fd8264fb780e11ae6f4f25607", "336d9786a90c4047a8432f41a42a98f0", "2bd23397ca2848e6a929a0a32dc933cd", "cabb323ba91341e1aacbce512dcf97a5", "debb8cfdcbd24079b8194870bbccf2cb", "5464f14e25c447ff8f09aaa8a7ddc955", "1aafb0176c6c479eadf8e6fc283ca834", "0a9c1bed9d1641c9863d3f85ff054ec5", "9f0c9c565e314c70a28bf2e2b3314a95", "b53ab7e4d96743be8977d93c042edff6", "6a4272eeb39b4081abe5aa7c2a611fd8", "54863c8068fb4946a37b0779fc335b3a", "f65e6142aab147e793e950861d854e01", "15ddc7e756df46b3a41bfd8c50d773dc", "f60ecbd3d1514c3cb27899e4b87be6c8", "2952f1a3419f4aa3bd8792543585351f", "0cc37d373e27436193803ef01d75905f", "446c1fa9a7bc4da6983be0f2b70dbbec", "b171e23da4b44f2c8e52078fc8d0ef53", "af07ea5e6bb244d0b7c5d7ee56559611", "cd50fa0b17624d339d919b4276f7d1f3", "2d4e2f8050bf41f483f4d80fcadec960", "a5105d61c2134e6ead3aa4253ace31a0", "df53dbababa54400bcba9b203f321d75", "85e7863a23164d639ec6a38003a044e5", "0772b86a83c14dbebd6566684f6c36f1", "99890e053fea47469bc84bb393de1ca8", "c217c97aa1c94cd8890306170fbe4874", "38685066cc574dc195b3f9f7f064dee0", "a5a8a3aded9449899e2482845f561051", "132a2818b5424dafbdaea1b04ba4d806", "ae19ddd1cd8b44b5bc9f37c3f8d9b91e", "415a1ed4d08c4571b6012d21e173eeee", "57a7fcf88bdc4db18a3603629c312c28", "79e0d5162ae849d69eff07666cf20680", "e2a6cddbde7c4e4e804b5b48ea4e5316", "3fd88818ca054dee8e3f4c72ecfe4e71", "e00c0c02d7a14e5ca499c0477416f076", "86c3997f6fcc4a70955d6dc4356f7c27", "43fa4548b44947f5bc03e3d43d5e3282", "d41b16c91107498fb3c54f6132acd5d7", "8a4a44516e00451aa495e11ffdc90160", "e843393f43c84d9c81098252320b1c21", "83f1936e6e8648fda4e3b8b28a3f61b1", "dc7df640ed074abb8430b2d3d3eb58c0", "bb3f80525ddf40eba5120bd606b53993" ] }, "id": "4xsEnsJb0zCA", "outputId": "bffbdff3-2359-4b44-bda2-ec48f555b706" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c370b066ae6c4cc2bea8a791eb52af11", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Split strings: 0%| | 0/32 [00:00 List[int]:\n", " \"\"\"\n", " Retrieve the top-k document indices based on the BM25 algorithm for a given query.\n", " Args:\n", " query (str): The search query string.\n", " k (int): The number of top documents to retrieve.\n", " bm25_index: The BM25 index object used for retrieval.\n", " Returns:\n", " List[int]: A list of indices of the top-k documents that match the query.\n", " \"\"\"\n", "\n", " results, scores = bm25_index.retrieve(bm25s.tokenize(query), k=k)\n", "\n", " return [contextual_chunks.index(doc) for doc in results[0]]" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35, "referenced_widgets": [ "bf839df28f95418f92fc92d2fefc2248", "281ff14051a443c8a349a0f5f2c412d8", "2f80d24e869e402c95514e2726b61b05", "3172c68925bd49a0aec33783729d626c", "2311447a6805488ab80a3a0413c6cab0", "71044f08dd6f48ee892d34f8f6d4d135", "625a7f1925154627912798b3ab300327", "df4f2adef0c5431082b339d83b363bbf", "803de8a39fee4556abb0ced7b0289ed9", "03e8f44093ea444bb80f6d631e05a1cb", "873f0b1d4edd4ebeb2c2212be8d52cbc", "813f8b741bdb4ef892baabe182190216", "f028e928a4ed43cb91bc0f1d712fad6a", "5e9c3cfe3f0f495b98420641df5a0fc5", "656fe8187fe94efdabbc295f0f54824e", "7421906cd3434707a73afb547b12225b", "65650f003a1648ca8f2fb1d2d3fa7fc4", "7dbcae1bf34644b4a5a094f9be87d57a", "aa1f92a769ef46f79ee55b17e78c3a4c", "6174f54ffd704bfdbbfac9f5ee884ced", "1c87dc879ca346c7a696d58f1ec9c66a", "01adcc48928641caa7a23cc05bd32345" ] }, "id": "h2TctI7qUjDC", "outputId": "1d009443-d358-4331-ffae-e2b8e6299f94" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bf839df28f95418f92fc92d2fefc2248", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Split strings: 0%| | 0/1 [00:00\n", "\n", "### 1. Perform Vector and BM25 retrieval and then fuse the ranks using the RRF algorithm implemented below." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17, "referenced_widgets": [ "bc0a6f88c7ce40ef9d6ccb3162575af2", "9ae8ea72b2d54e67a693ab2bb6a96092", "e76759bb6a3842c999c017874f19de7e", "83ff20d9055d4ee1930db2034382513a", "7bb1843212384c4080aff1a576c0c4f5", "c2a7375e2bd94ce9b3a633edca9306fb", "cf4a9e1756394c6ea3b912596d696957", "08dd65de948f48aeb27a01f124af488b", "50022d07420740d9bed8ced8f4cb577a", "b0d30b32a2584b23b1c5f7afbcb7a129", "8fba140e939d4212a4e8b3deb546810b", "f65099e43dff41c3850e1ad10cd123ef", "8cd50ff5cb6d451ba6572da5784eae77", "1797a97ee421429f8052b722aa765f9a", "23855ae0124947d38b649624a6bc8921", "c159ca88113e458d9122b9f619c8065f", "fd307d0ecc1e47c1b4cd3585d47e2426", "15280906567245a490e16adaec4d9517", "fefbbe36d4724ac48e39dcddc1ff7ca4", "28df2123d97d4cf5a935193167b3ea49", "fee520af1e7e41468f7016850f4c0aa0", "6f971460f8644f2c887da93edcaf1053" ] }, "id": "JQOpIyuc4Yyj", "outputId": "ff8f0f24-a53c-4a48-94db-2c3d163470ad" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bc0a6f88c7ce40ef9d6ccb3162575af2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Split strings: 0%| | 0/1 [00:00