{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "072150ea-1f44-4428-94ae-695ba94b2f7d",
   "metadata": {},
   "source": [
    "# Retrieval-Augmented Generation for Presidential Speeches using Groq API and Langchain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7a4fc92-eb9a-4273-8ff6-0fc5b96236d7",
   "metadata": {},
   "source": [
    "Retrieval-Augmented Generation (RAG) is a widely-used technique that enables us to gather pertinent information from an external data source and provide it to our Large Language Model (LLM). It helps solve two of the biggest limitations of LLMs: knowledge cutoffs, in which information after a certain date or for a specific source is not available to the LLM, and hallucination, in which the LLM makes up an answer to a question it doesn't have the information for. With RAG, we can ensure that the LLM has relevant information to answer the question at hand."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea1ae66c-a322-467d-b789-f7ce5a636ad7",
   "metadata": {},
   "source": [
    "In this notebook we will be using [Groq API](https://console.groq.com), [LangChain](https://www.langchain.com/) and [Pinecone](https://www.pinecone.io/) to perform RAG on [presidential speech transcripts](https://millercenter.org/the-presidency/presidential-speeches) from the Miller Center at the University of Virginia. In doing so, we will create vector embeddings for each speech, store them in a vector database, retrieve the most relevent speech excerpts pertaining to the user prompt and include them in context for the LLM."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7784880-495e-4d7c-a045-d12b7f57b65d",
   "metadata": {},
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b4679c23-7035-4276-b3d6-95cd89916477",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from groq import Groq\n",
    "import os\n",
    "import pinecone\n",
    "\n",
    "from langchain_community.vectorstores import Chroma\n",
    "from langchain.text_splitter import TokenTextSplitter\n",
    "from langchain.docstore.document import Document\n",
    "from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings\n",
    "from langchain_pinecone import PineconeVectorStore\n",
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "from sklearn.metrics.pairwise import cosine_similarity\n",
    "\n",
    "from IPython.display import display, HTML"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4c18688b-178f-439d-90a4-590f99ade11f",
   "metadata": {},
   "source": [
    "A Groq API Key is required for this demo - you can generate one for free [here](https://console.groq.com/). We will be using Pinecone as our vector database, which also requires an API key (you can create one index for a small project there for free on their Starter plan), but will also show how it works with [Chroma DB](https://www.trychroma.com/), a free open source alternative that stores vector embeddings in memory. We will also use the Llama3 8b model for this demo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "14fd5b33-360e-4fbe-ad29-11d5f759b0d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "groq_api_key = os.getenv('GROQ_API_KEY')\n",
    "pinecone_api_key = os.getenv('PINECONE_API_KEY')\n",
    "\n",
    "client = Groq(api_key = groq_api_key)\n",
    "model = \"llama3-8b-8192\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "469e5b3a-6c5d-49cd-a547-222d45d7a996",
   "metadata": {},
   "source": [
    "### RAG Basics with One Document"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "283183cd-ba64-4e98-a0d9-a6165e88494e",
   "metadata": {},
   "source": [
    "The presidential speeches we'll be using are stored in this [.csv file](https://github.com/groq/groq-api-cookbook/blob/main/presidential-speeches-rag/presidential_speeches.csv). Each row of the .csv contains fields for the date, president, party, speech title, speech summary and speech transcript, and includes every recorded presidential speech through the Trump presidency:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d1017409-cb0e-402b-9c53-c61729296bd2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>President</th>\n",
       "      <th>Party</th>\n",
       "      <th>Speech Title</th>\n",
       "      <th>Summary</th>\n",
       "      <th>Transcript</th>\n",
       "      <th>URL</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1789-04-30</td>\n",
       "      <td>George Washington</td>\n",
       "      <td>Unaffiliated</td>\n",
       "      <td>First Inaugural Address</td>\n",
       "      <td>Washington calls on Congress to avoid local an...</td>\n",
       "      <td>Fellow Citizens of the Senate and the House of...</td>\n",
       "      <td>https://millercenter.org/the-presidency/presid...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1789-10-03</td>\n",
       "      <td>George Washington</td>\n",
       "      <td>Unaffiliated</td>\n",
       "      <td>Thanksgiving Proclamation</td>\n",
       "      <td>At the request of Congress, Washington establi...</td>\n",
       "      <td>Whereas it is the duty of all Nations to ackno...</td>\n",
       "      <td>https://millercenter.org/the-presidency/presid...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1790-01-08</td>\n",
       "      <td>George Washington</td>\n",
       "      <td>Unaffiliated</td>\n",
       "      <td>First Annual Message to Congress</td>\n",
       "      <td>In a wide ranging speech, President Washington...</td>\n",
       "      <td>Fellow Citizens of the Senate and House of Rep...</td>\n",
       "      <td>https://millercenter.org/the-presidency/presid...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1790-12-08</td>\n",
       "      <td>George Washington</td>\n",
       "      <td>Unaffiliated</td>\n",
       "      <td>Second Annual Message to Congress</td>\n",
       "      <td>Washington focuses on commerce in his second a...</td>\n",
       "      <td>Fellow citizens of the Senate and House of Rep...</td>\n",
       "      <td>https://millercenter.org/the-presidency/presid...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1790-12-29</td>\n",
       "      <td>George Washington</td>\n",
       "      <td>Unaffiliated</td>\n",
       "      <td>Talk to the Chiefs and Counselors of the Senec...</td>\n",
       "      <td>The President reassures the Seneca Nation that...</td>\n",
       "      <td>I the President of the United States, by my ow...</td>\n",
       "      <td>https://millercenter.org/the-presidency/presid...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Date          President         Party  \\\n",
       "0  1789-04-30  George Washington  Unaffiliated   \n",
       "1  1789-10-03  George Washington  Unaffiliated   \n",
       "2  1790-01-08  George Washington  Unaffiliated   \n",
       "3  1790-12-08  George Washington  Unaffiliated   \n",
       "4  1790-12-29  George Washington  Unaffiliated   \n",
       "\n",
       "                                        Speech Title  \\\n",
       "0                            First Inaugural Address   \n",
       "1                          Thanksgiving Proclamation   \n",
       "2                   First Annual Message to Congress   \n",
       "3                  Second Annual Message to Congress   \n",
       "4  Talk to the Chiefs and Counselors of the Senec...   \n",
       "\n",
       "                                             Summary  \\\n",
       "0  Washington calls on Congress to avoid local an...   \n",
       "1  At the request of Congress, Washington establi...   \n",
       "2  In a wide ranging speech, President Washington...   \n",
       "3  Washington focuses on commerce in his second a...   \n",
       "4  The President reassures the Seneca Nation that...   \n",
       "\n",
       "                                          Transcript  \\\n",
       "0  Fellow Citizens of the Senate and the House of...   \n",
       "1  Whereas it is the duty of all Nations to ackno...   \n",
       "2  Fellow Citizens of the Senate and House of Rep...   \n",
       "3  Fellow citizens of the Senate and House of Rep...   \n",
       "4  I the President of the United States, by my ow...   \n",
       "\n",
       "                                                 URL  \n",
       "0  https://millercenter.org/the-presidency/presid...  \n",
       "1  https://millercenter.org/the-presidency/presid...  \n",
       "2  https://millercenter.org/the-presidency/presid...  \n",
       "3  https://millercenter.org/the-presidency/presid...  \n",
       "4  https://millercenter.org/the-presidency/presid...  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "presidential_speeches_df = pd.read_csv('presidential_speeches.csv')\n",
    "presidential_speeches_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9aaabfb-c34d-40f1-a90f-f448a9051130",
   "metadata": {},
   "source": [
    "To get a better idea of the steps involved in building a RAG system, let's focus on a single speech to start. In honor of his [upcoming Netflix series](https://www.netflix.com/tudum/articles/death-by-lightning-tv-series-adaptation) and his distinction of being the only president to [contribute an original proof of the Pythagorean Theorem](https://maa.org/press/periodicals/convergence/mathematical-treasure-james-a-garfields-proof-of-the-pythagorean-theorem), we'll use James Garfield's Inaugural Address:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "39439748-8652-415d-a5e7-0f421a6ae30a",
   "metadata": {},
   "outputs": [],
   "source": [
    "garfield_inaugural = presidential_speeches_df.iloc[309].Transcript\n",
    "#display(HTML(garfield_inaugural)) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e1a9811-2fd6-4c99-ba11-a9df2df33ec0",
   "metadata": {},
   "source": [
    "A challenge with prompting LLMs can be running into limits with their context window. While this speech is not extremely long and would actually fit in Llama3's context window, it is not always great practice to use way more of the context window than you need, so when using RAG we want to split up the text to provide only relevant parts of it to the LLM. To do so, we first need to ```tokenize``` the transcript. We'll use the ```sentence-transformers/all-MiniLM-L6-v2``` tokenzier with the transformers AutoTokenizer class for this - this will show the number of tokens the model counts in Garfield's Inaugural Address:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c6057e9f-874e-4d7a-9f3c-e411a9acbb2e",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Token indices sequence length is longer than the specified maximum sequence length for this model (3420 > 512). Running this sequence through the model will result in indexing errors\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3420"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_id = \"sentence-transformers/all-MiniLM-L6-v2\"\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
    "\n",
    "# create the length function\n",
    "def token_len(text):\n",
    "    tokens = tokenizer.encode(\n",
    "        text\n",
    "    )\n",
    "    return len(tokens)\n",
    "\n",
    "token_len(garfield_inaugural)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71de3b6c-e89b-4446-a585-32cf10a1cd8e",
   "metadata": {},
   "source": [
    "Next, we'll split the text into chunks using LangChain's `TokenTextSplitter` function. In this example we will set the maximum tokens in a chunk to be 450, with a 20 token overlap to reduce the chances that a sentence or concept will be split into different chunks.\n",
    "\n",
    "Note that LangChain uses OpenAI's `tiktoken` tokenizer, so our tokenizer will count tokens a bit differently - when adjusting for this, our chunk sizes will be around 500 tokens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "20ba719b-9a03-437a-a665-a0bde9ec24cf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "453\n",
      "455\n",
      "467\n",
      "457\n",
      "457\n",
      "455\n",
      "461\n",
      "368\n"
     ]
    }
   ],
   "source": [
    "text_splitter = TokenTextSplitter(\n",
    "    chunk_size=450, # 500 tokens is the max\n",
    "    chunk_overlap=20 # Overlap of N tokens between chunks (to reduce chance of cutting out relevant connected text like middle of sentence)\n",
    ")\n",
    "\n",
    "chunks = text_splitter.split_text(garfield_inaugural)\n",
    "\n",
    "for chunk in chunks:\n",
    "    print(token_len(chunk))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce723eea-7e69-48c1-8452-957709d117db",
   "metadata": {},
   "source": [
    "Next, we will embed each chunk into a semantic vector space using the all-MiniLM-L6-v2 model, through LangChain's implementation of Sentence Transformers from [HuggingFace](https://huggingface.co/sentence-transformers). Note that each embedding has a length of 384."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "740cea0e-c568-4522-995b-2bc1b9f1d4d8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "384 [-0.041311442852020264, 0.04761345684528351, 0.007975001819431782, -0.030207891017198563, 0.04763732850551605, 0.03253324702382088, 0.012350181117653847, -0.044836871325969696, -0.008013647049665451, 0.015704018995165825, -0.0009443548624403775, 0.11632765829563141, -0.007115611340850592, -0.03356580808758736, -0.043237943202257156, 0.06872360408306122, -0.04552490636706352, -0.07017458975315094, -0.10271692276000977, 0.11116139590740204]\n"
     ]
    }
   ],
   "source": [
    "chunk_embeddings = []\n",
    "embedding_function = SentenceTransformerEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n",
    "for chunk in chunks:\n",
    "    chunk_embeddings.append(embedding_function.embed_query(chunk))\n",
    "\n",
    "print(len(chunk_embeddings[0]),chunk_embeddings[0][:20]) #Shows first 25 embeddings out of 384"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a52f835-69f8-465d-a06e-fb5e31656b37",
   "metadata": {},
   "source": [
    "Finally, we will embed our prompt and use cosine similarity to find the most relevant chunk to the question we'd like answered:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e3315f7c-6523-4aca-a624-2e2076b3e6bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "user_question = \"What were James Garfield's views on civil service reform?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "587efed4-1d6a-402e-9c47-7259fbe898be",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       " permitted to usurp in the smallest degree the functions and powers of the National Government. The civil service can never be placed on a satisfactory basis until it is regulated by law. For the good of the service itself, for the protection of those who are intrusted with the appointing power against the waste of time and obstruction to the public business caused by the inordinate pressure for place, and for the protection of incumbents against intrigue and wrong, I shall at the proper time ask Congress to fix the tenure of the minor offices of the several Executive Departments and prescribe the grounds upon which removals shall be made during the terms for which incumbents have been appointed. Finally, acting always within the authority and limitations of the Constitution, invading neither the rights of the States nor the reserved rights of the people, it will be the purpose of my Administration to maintain the authority of the nation in all places within its jurisdiction; to enforce obedience to all the laws of the Union in the interests of the people; to demand rigid economy in all the expenditures of the Government, and to require the honest and faithful service of all executive officers, remembering that the offices were created, not for the benefit of incumbents or their supporters, but for the service of the Government. And now, fellow citizens, I am about to assume the great trust which you have committed to my hands. I appeal to you for that earnest and thoughtful support which makes this Government in fact, as it is in law, a government of the people. I shall greatly rely upon the wisdom and patriotism of Congress and of those who may share with me the responsibilities and duties of administration, and, above all, upon our efforts to promote the welfare of this great people and their Government I reverently invoke the support and blessings of AlmightyGod"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "prompt_embeddings = embedding_function.embed_query(user_question) \n",
    "similarities = cosine_similarity([prompt_embeddings], chunk_embeddings)[0] \n",
    "closest_similarity_index = np.argmax(similarities) \n",
    "most_relevant_chunk = chunks[closest_similarity_index]\n",
    "display(HTML(most_relevant_chunk))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "913592d1-454f-43c1-9255-956c7c37b222",
   "metadata": {},
   "source": [
    "Now, we can feed the most relevant speech expert into our chat completion model so that the LLM can use it to answer our question:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "159ace36-71bf-4af9-9719-83ba1182071f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'James Garfield, in his inaugural address on March 4, 1881, briefly touched on the subject of civil service reform. He expressed his belief that the civil service could not be placed on a satisfactory basis until it was regulated by law. He also mentioned his intention to ask Congress to fix the tenure of minor offices and prescribe the grounds for removal during the terms for which incumbents had been appointed. He stated that this would be done to protect those with appointing power, incumbents, and to ensure honest and faithful service from executive officers. Garfield believed that offices were created for the service of the Government, not for the benefit of incumbents or their supporters.\\n\\nSource: Inaugural Address, March 4, 1881.'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# A chat completion function that will use the most relevant exerpt(s) from presidential speeches to answer the user's question\n",
    "def presidential_speech_chat_completion(client, model, user_question, relevant_excerpts):\n",
    "    chat_completion = client.chat.completions.create(\n",
    "        messages = [\n",
    "            {\n",
    "                \"role\": \"system\",\n",
    "                \"content\": \"You are a presidential historian. Given the user's question and relevant excerpts from presidential speeches, answer the question by including direct quotes from presidential speeches. When using a quote, site the speech that it was from (ignoring the chunk).\" \n",
    "            },\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": \"User Question: \" + user_question + \"\\n\\nRelevant Speech Exerpt(s):\\n\\n\" + relevant_excerpts,\n",
    "            }\n",
    "        ],\n",
    "        model = model\n",
    "    )\n",
    "    \n",
    "    response = chat_completion.choices[0].message.content\n",
    "    return response\n",
    "\n",
    "\n",
    "presidential_speech_chat_completion(client, model, user_question, most_relevant_chunk)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b144390-558b-47a5-9c67-9e3a7b6c8138",
   "metadata": {},
   "source": [
    "### Using a Vector DB to store and retrieve embeddings for all speeches"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23ab35ad-d47b-4bfe-ac9c-5fbf4946f3ae",
   "metadata": {},
   "source": [
    "Now, let's repeat the same process for every speech in our .csv using the same text splitter as above. Note that we will be converting our text to a `Document` object so that it integrates with the vector database, and also prepending the president, date and title to the speech transcript to provide more context to the LLM:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "94d8ba00-4360-4313-a271-272013a74f66",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10698\n"
     ]
    }
   ],
   "source": [
    "documents = []\n",
    "for index, row in presidential_speeches_df[presidential_speeches_df['Transcript'].notnull()].iterrows():\n",
    "    chunks = text_splitter.split_text(row.Transcript)\n",
    "    total_chunks = len(chunks)\n",
    "    for chunk_num in range(1,total_chunks+1):\n",
    "        header = f\"Date: {row['Date']}\\nPresident: {row['President']}\\nSpeech Title: {row['Speech Title']} (chunk {chunk_num} of {total_chunks})\\n\\n\"\n",
    "        chunk = chunks[chunk_num-1]\n",
    "        documents.append(Document(page_content=header + chunk, metadata={\"source\": \"local\"}))\n",
    "\n",
    "print(len(documents))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eec976bc-5f33-49bc-a61a-5f2ee4a293d6",
   "metadata": {},
   "source": [
    "I will be using a Pinecone index called `presidential-speeches` for this demo. As mentioned above, you can sign up for Pinecone's Starter plan for free and have access to a single index, which is ideal for a small personal project. You can also use Chroma DB as an open source alternative. Note that either Vector DB will use the same embedding function we've defined above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "84d9ef15-62b4-4961-80d0-27a558389c8c",
   "metadata": {},
   "outputs": [],
   "source": [
    "pinecone_index_name = \"presidential-speeches\"\n",
    "docsearch = PineconeVectorStore.from_documents(documents, embedding_function, index_name=pinecone_index_name)\n",
    "\n",
    "### Use Chroma for open source option\n",
    "#docsearch = Chroma.from_documents(documents, embedding_function)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83dcec95-98f3-4d11-bb43-2ab967741067",
   "metadata": {},
   "source": [
    "Fortunately, all of the manual work we did above to embed text and use cosine similarity to find the most relevant chunk is done under the hood when using a vector database. Now, we can ask our question again, over the entire corpus of presidential speeches."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "f7d6698f-0331-43e7-9a83-2c5b684ec44c",
   "metadata": {},
   "outputs": [],
   "source": [
    "user_question = \"What were James Garfield's views on civil service reform?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "91f88d8d-d2a9-4289-a3b6-0a8415f0e72b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "relevent_docs = docsearch.similarity_search(user_question)\n",
    "\n",
    "# print results\n",
    "#display(HTML(relevent_docs[0].page_content))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "190c1a66-ded4-4340-94e9-0a789704c03d",
   "metadata": {},
   "source": [
    "We will use the three most relevant excerpts in our system prompt. Note that even with nearly 1000 speeches chunked and stored in our vector database, the similarity search still found the same one as when we only parsed Garfield's Inaugural Address:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "77a5b3bc-7e6a-40d8-b012-38bfebeaa641",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Date: 1881-03-04<br>President: James A. Garfield<br>Speech Title: Inaugural Address (chunk 8 of 8)<br><br> permitted to usurp in the smallest degree the functions and powers of the National Government. The civil service can never be placed on a satisfactory basis until it is regulated by law. For the good of the service itself, for the protection of those who are intrusted with the appointing power against the waste of time and obstruction to the public business caused by the inordinate pressure for place, and for the protection of incumbents against intrigue and wrong, I shall at the proper time ask Congress to fix the tenure of the minor offices of the several Executive Departments and prescribe the grounds upon which removals shall be made during the terms for which incumbents have been appointed. Finally, acting always within the authority and limitations of the Constitution, invading neither the rights of the States nor the reserved rights of the people, it will be the purpose of my Administration to maintain the authority of the nation in all places within its jurisdiction; to enforce obedience to all the laws of the Union in the interests of the people; to demand rigid economy in all the expenditures of the Government, and to require the honest and faithful service of all executive officers, remembering that the offices were created, not for the benefit of incumbents or their supporters, but for the service of the Government. And now, fellow citizens, I am about to assume the great trust which you have committed to my hands. I appeal to you for that earnest and thoughtful support which makes this Government in fact, as it is in law, a government of the people. I shall greatly rely upon the wisdom and patriotism of Congress and of those who may share with me the responsibilities and duties of administration, and, above all, upon our efforts to promote the welfare of this great people and their Government I reverently invoke the support and blessings of AlmightyGod<br><br>------------------------------------------------------<br><br>Date: 1881-03-04<br>President: James A. Garfield<br>Speech Title: Inaugural Address (chunk 3 of 8)<br><br> is the most important political change we have known since the adoption of the Constitution of 1787. NO thoughtful man can fail to appreciate its beneficent effect upon our institutions and people. It has freed us from the perpetual danger of war and dissolution. It has added immensely to the moral and industrial forces of our people. It has liberated the master as well as the slave from a relation which wronged and enfeebled both. It has surrendered to their own guardianship the manhood of more than mechanical ) double, and has opened to each one of them a career of freedom and usefulness. It has given new inspiration to the power of self help in both races by making labor more honorable to the one and more necessary to the other. The influence of this force will grow greater and bear richer fruit with the coming years. No doubt this great change has caused serious disturbance to our Southern communities. This is to be deplored, though it was perhaps unavoidable. But those who resisted the change should remember that under our institutions there was no middle ground for the negro race between slavery and equal citizenship. There can be no permanent disfranchised peasantry in the UnitedStates. Freedom can never yield its fullness of blessings so long as the law or its administration places the smallest obstacle in the pathway of any virtuous citizen. The emancipated race has already made remarkable progress. With unquestioning devotion to the Union, with a patience and gentleness not born of fear, they have “followed the light as God gave them to see the light.” They are rapidly laying the material foundations of self support, widening their circle of intelligence, and beginning to enjoy the blessings that gather around the homes of the industrious poor. They deserve the generous encouragement of all good men. So far as my authority can lawfully extend they shall enjoy the full and equal protection of the Constitution and the laws. The free enjoyment of equal suffrage is still in question, and a frank statement of the issue may aid its solution. It is alleged that in many communities negro citizens are practically denied the freedom of the ballot. In so far as the truth of this allegation is admitted, it is answered that in many places honest local government is impossible if the mass of un<br><br>------------------------------------------------------<br><br>Date: 1882-12-04<br>President: Chester A. Arthur<br>Speech Title: Second Annual Message (chunk 27 of 29)<br><br> the Senate bill, to which I have already referred, exclusively applies. While neither that bill nor any other prominent scheme for improving the civil service concerns the higher grade of officials, who are appointed by the President and confirmed by the Senate, I feel bound to correct a prevalent misapprehension as to the frequency with which the present Executive has displaced the incumbent of an office and appointed another in his stead. It has been repeatedly alleged that he has in this particular signally departed from the course which has been pursued under recent Administrations of the Government. The facts are as follows: The whole number of Executive appointments during the four years immediately preceding Mr. Garfield's accession to the Presidency was 2,696. Of this number 244, or 9 per cent, involved the removal of previous incumbents. The ratio of removals to the whole number of appointments was much the same during each of those four years. In the first year, with 790 appointments, there were 74 removals, or 9.3 per cent; in the second, with 917 appointments, there were 85 removals, or 8.5 per cent; in the third, with 480 appointments, there were 48 removals, or 10 per cent; in the fourth, with 429 appointments, there were 37 removals, or 8.6 per cent. In the four months of President Garfield's Administration there were 390 appointments and 89 removals, or 22.7 per cent. Precisely the same number of removals ( 89 ) has taken place in the fourteen months which have since elapsed, but they constitute only 7.8 per cent of the whole number of appointments ( 1,11MADISON. By within that period and less than 2.6 of the entire list of officials ( 3,459 ), exclusive of the Army and Navy, which is filled by Presidential appointment. I declare my approval of such legislation as may be found necessary for supplementing the existing provisions of law in relation to political assessments. In July last I authorized a public announcement that employees of the Government should regard themselves as at liberty to exercise their pleasure in making or refusing to make political contributions, and that their action in that regard would in no manner"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "relevant_excerpts = '\\n\\n------------------------------------------------------\\n\\n'.join([doc.page_content for doc in relevent_docs[:3]])\n",
    "display(HTML(relevant_excerpts.replace(\"\\n\", \"<br>\")))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "9b2fc804-4d5e-4db2-b185-79ff85c36362",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'James Garfield, in his Inaugural Address delivered on March 4, 1881, expressed his views on civil service reform. He believed that the civil service could not be placed on a satisfactory basis until it was regulated by law. He proposed to ask Congress to fix the tenure of the minor offices of the several Executive Departments and prescribe the grounds upon which removals shall be made during the terms for which incumbents have been appointed. He stated, \"For the good of the service itself, for the protection of those who are intrusted with the appointing power against the waste of time and obstruction to the public business caused by the inordinate pressure for place, and for the protection of incumbents against intrigue and wrong, I shall at the proper time ask Congress to fix the tenure of the minor offices of the several Executive Departments and prescribe the grounds upon which removals shall be made during the terms for which incumbents have been appointed.\"\\n\\nHe also mentioned that he will act within the authority and limitations of the Constitution, invading neither the rights of the States nor the reserved rights of the people, it will be the purpose of my Administration to maintain the authority of the nation in all places within its jurisdiction; to enforce obedience to all the laws of the Union in the interests of the people; to demand rigid economy in all the expenditures of the Government, and to require the honest and faithful service of all executive officers, remembering that the offices were created, not for the benefit of incumbents or their supporters, but for the service of the Government.\\n\\nIt is also worth noting that Garfield\\'s successor, Chester A. Arthur, in his Second Annual Message delivered on December 4, 1882, mentioned that Garfield\\'s administration had a higher percentage of removals (22.7%) than the previous four administrations (the ratio of removals to the whole number of appointments was much the same during each of those four years, and ranged from 8.6% to 10%). Arthur states that \"In the four months of President Garfield\\'s Administration there were 390 appointments and 89 removals, or 22.7 per cent. Precisely the same number of removals (89) has taken place in the fourteen months which have since elapsed, but they constitute only 7.8 per cent of the whole number of appointments (1,119) within that period and less than 2.6 per cent of the entire list of officials (3,459), exclusive of the Army and Navy, which is filled by Presidential appointment. \"'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "presidential_speech_chat_completion(client, model, user_question, relevant_excerpts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bc73962-81aa-4ca7-88dd-83793195d382",
   "metadata": {},
   "source": [
    "# Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9b86556-d31d-4896-a342-8eff9d9fb48b",
   "metadata": {},
   "source": [
    "In this notebook we've shown how to implement a RAG system using Groq API, LangChain and Pinecone by embedding, storing and searching over nearly 1,000 speeches from US presidents. By embedding speech transcripts into a vector database and leveraging the power of semantic search, we have demonstrated how to overcome two of the most significant challenges faced by LLMs: the knowledge cutoff and hallucination issues.\n",
    "\n",
    "You can interact with this RAG application here: https://presidential-speeches-rag.streamlit.app/"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}