瀏覽代碼

updated the RAG example

Thierry Moreau 11 月之前
父節點
當前提交
54f0949828

+ 23 - 29
recipes/llama_api_providers/OctoAI_API_examples/RAG_Chatbot_example/RAG_Chatbot_Example.ipynb

@@ -4,16 +4,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Building a Llama 2 chatbot with Retrieval Augmented Generation (RAG)\n",
+    "# Building a Llama 3 chatbot with Retrieval Augmented Generation (RAG)\n",
     "\n",
     "This notebook shows a complete example of how to build a Llama 2 chatbot hosted on your browser that can answer questions based on your own data. We'll cover:\n",
-    "* How to run Llama2 in the cloud hosted on OctoAI\n",
+    "* How to run Llama 3 in the cloud hosted on OctoAI\n",
     "* A chatbot example built with [Gradio](https://github.com/gradio-app/gradio) and wired to the server\n",
-    "* Adding RAG capability with Llama 2 specific knowledge based on our Getting Started [guide](https://ai.meta.com/llama/get-started/)\n",
+    "* Adding RAG capability with Llama 3 specific knowledge based on our Getting Started [guide](https://ai.meta.com/llama/get-started/)\n",
     "\n",
     "\n",
     "**Note** We will be using OctoAI to run the examples here. You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account, then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first).\n",
-    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI."
+    "After the free trial ends, you will need to enter billing info to continue to use Llama 3 hosted on OctoAI."
    ]
   },
   {
@@ -51,14 +51,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## How to Develop a RAG Powered Llama 2 Chatbot\n",
+    "## How to Develop a RAG Powered Llama 3 Chatbot\n",
     "\n",
-    "The easiest way to develop RAG-powered Llama 2 chatbots is to use frameworks such as [**LangChain**](https://www.langchain.com/) and [**LlamaIndex**](https://www.llamaindex.ai/), two leading open-source frameworks for building LLM apps. Both offer convenient APIs for implementing RAG with Llama 2 including:\n",
+    "The easiest way to develop RAG-powered Llama 3 chatbots is to use frameworks such as [**LangChain**](https://www.langchain.com/) and [**LlamaIndex**](https://www.llamaindex.ai/), two leading open-source frameworks for building LLM apps. Both offer convenient APIs for implementing RAG with Llama 3 including:\n",
     "\n",
     "* Load and split documents\n",
     "* Embed and store document splits\n",
     "* Retrieve the relevant context based on the user query\n",
-    "* Call Llama 2 with query and context to generate the answer\n",
+    "* Call Llama 3 with query and context to generate the answer\n",
     "\n",
     "LangChain is a more general purpose and flexible framework for developing LLM apps with RAG capabilities, while LlamaIndex as a data framework focuses on connecting custom data sources to LLMs. The integration of the two may provide the best performant and effective solution to building real world RAG apps.\n",
     "In our example, for simplicifty, we will use LangChain alone with locally stored PDF data."
@@ -73,7 +73,7 @@
     "For this demo, we will be using the Gradio for chatbot UI, Text-generation-inference framework for model serving.\n",
     "For vector storage and similarity search, we will be using [FAISS](https://github.com/facebookresearch/faiss).\n",
     "In this example, we will be running everything in a AWS EC2 instance (i.e. [g5.2xlarge]( https://aws.amazon.com/ec2/instance-types/g5/)). g5.2xlarge features one A10G GPU. We recommend running this notebook with at least one GPU equivalent to A10G with at least 16GB video memory.\n",
-    "There are certain techniques to downsize the Llama 2 7B model, so it can fit into smaller GPUs. But it is out of scope here.\n",
+    "There are certain techniques to downsize the Llama 3 7B model, so it can fit into smaller GPUs. But it is out of scope here.\n",
     "\n",
     "First, let's install all dependencies with PIP. We also recommend you start a dedicated Conda environment for better package management.\n",
     "\n",
@@ -109,7 +109,7 @@
     "### Data Processing\n",
     "\n",
     "First run all the imports and define the path of the data and vector storage after processing.\n",
-    "For the data, we will be using a raw pdf crawled from Llama 2 Getting Started guide on [Meta AI website](https://ai.meta.com/llama/)."
+    "For the data, we will be using a raw pdf crawled from \"Llama 2 Getting Started\" guide on [Meta AI website](https://ai.meta.com/llama/)."
    ]
   },
   {
@@ -276,14 +276,12 @@
     "from langchain.prompts.prompt import PromptTemplate\n",
     "from anyio.from_thread import start_blocking_portal #For model callback streaming\n",
     "\n",
-    "# langchain.debug=True\n",
-    "\n",
-    "#vector db path\n",
+    "# Vector db path\n",
     "DB_FAISS_PATH = 'vectorstore/db_faiss'\n",
     "\n",
     "model_dict = {\n",
-    "    \"13-chat\" : \"llama-2-13b-chat-fp16\",\n",
-    "    \"70b-chat\" : \"llama-2-70b-chat-fp16\",\n",
+    "    \"8b-instruct\" : \"meta-llama-3-8b-instruct\",\n",
+    "    \"70b-instruct\" : \"meta-llama-3-70b-instruct\",\n",
     "}\n",
     "\n",
     "system_message = {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}"
@@ -303,22 +301,24 @@
    "outputs": [],
    "source": [
     "embeddings = OctoAIEmbeddings(endpoint_url=\"https://text.octoai.run/v1/embeddings\")\n",
-    "db = FAISS.load_local(DB_FAISS_PATH, embeddings)"
+    "db = FAISS.load_local(DB_FAISS_PATH, embeddings, allow_dangerous_deserialization=True)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next we call the Llama 2 model from OctoAI. In this example we will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
+    "Next we call the Llama 3 model from OctoAI. In this example we will use the Llama 3 8b instruct model. You can find more on Llama models on the [OctoAI text generation solution page](https://octoai.cloud/text).\n",
     "\n",
     "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
-    "* llama-2-13b-chat\n",
-    "* llama-2-70b-chat\n",
+    "* meta-llama-3-8b-instruct\n",
+    "* meta-llama-3-70b-instruct\n",
     "* codellama-7b-instruct\n",
     "* codellama-13b-instruct\n",
     "* codellama-34b-instruct\n",
-    "* codellama-70b-instruct"
+    "* llama-2-13b-chat\n",
+    "* llama-2-70b-chat\n",
+    "* llamaguard-7b"
    ]
   },
   {
@@ -329,16 +329,10 @@
    "source": [
     "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
     "\n",
-    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
     "llm = OctoAIEndpoint(\n",
-    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
-    "    model_kwargs={\n",
-    "        \"model\": llama2_13b,\n",
-    "        \"messages\": [system_message],\n",
-    "        \"max_tokens\": 500,\n",
-    "        \"top_p\": 1,\n",
-    "        \"temperature\": 0.01\n",
-    "    },\n",
+    "    model=model_dict[\"8b-instruct\"],\n",
+    "    max_tokens=500,\n",
+    "    temperature=0.01\n",
     ")"
    ]
   },
@@ -347,7 +341,7 @@
    "metadata": {},
    "source": [
     "Next, we define the retriever and template for our RetrivalQA chain. For each call of the RetrievalQA, LangChain performs a semantic similarity search of the query in the vector database, then passes the search results as the context to Llama to answer the query about the data stored in the verctor database.\n",
-    "Whereas for the template, this defines the format of the question along with context that we will be sent into Llama for generation. In general, Llama 2 has special prompt format to handle special tokens. In some cases, the serving framework might already have taken care of it. Otherwise, you will need to write customized template to properly handle that."
+    "Whereas for the template, this defines the format of the question along with context that we will be sent into Llama for generation. In general, Llama 3 has special prompt format to handle special tokens. In some cases, the serving framework might already have taken care of it. Otherwise, you will need to write customized template to properly handle that."
    ]
   },
   {

+ 2 - 2
recipes/llama_api_providers/OctoAI_API_examples/RAG_Chatbot_example/requirements.txt

@@ -1,7 +1,7 @@
 gradio==4.16.0
 pypdf==4.0.0
-langchain==0.1.7
+langchain==0.1.19
 sentence-transformers==2.2.2
 faiss-cpu==1.7.4
 text-generation==0.6.1
-octoai-sdk==0.8.3
+octoai-sdk==0.10.1