Bläddra i källkod

updating API examples to use Llama3 and latest langchain function signature

Thierry Moreau 11 månader sedan
förälder
incheckning
b203238af4

+ 38 - 37
recipes/llama_api_providers/OctoAI_API_examples/Getting_to_know_Llama.ipynb

@@ -13,6 +13,41 @@
   {
    "cell_type": "markdown",
    "metadata": {
+    "id": "h3YGMDJidHtH"
+   },
+   "source": [
+    "### **Install dependencies**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "VhN6hXwx7FCp"
+   },
+   "outputs": [],
+   "source": [
+    "# Install dependencies and initialize\n",
+    "%pip install \\\n",
+    "    langchain==0.1.19 \\\n",
+    "    matplotlib \\\n",
+    "    octoai-sdk==0.10.1 \\\n",
+    "    openai \\\n",
+    "    sentence_transformers \\\n",
+    "    pdf2image \\\n",
+    "    pdfminer \\\n",
+    "    pdfminer.six \\\n",
+    "    unstructured \\\n",
+    "    faiss-cpu \\\n",
+    "    pillow-heif \\\n",
+    "    opencv-python \\\n",
+    "    unstructured-inference \\\n",
+    "    pikepdf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
     "id": "ioVMNcTesSEk"
    },
    "source": [
@@ -245,40 +280,6 @@
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "h3YGMDJidHtH"
-   },
-   "source": [
-    "### **2.1 - Install dependencies**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "VhN6hXwx7FCp"
-   },
-   "outputs": [],
-   "source": [
-    "# Install dependencies and initialize\n",
-    "%pip install -qU \\\n",
-    "    langchain==0.1.19 \\\n",
-    "    octoai-sdk==0.10.1 \\\n",
-    "    openai \\\n",
-    "    sentence_transformers \\\n",
-    "    pdf2image \\\n",
-    "    pdfminer \\\n",
-    "    pdfminer.six \\\n",
-    "    unstructured \\\n",
-    "    faiss-cpu \\\n",
-    "    pillow-heif \\\n",
-    "    opencv-python \\\n",
-    "    unstructured-inference \\\n",
-    "    pikepdf"
-   ]
-  },
-  {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
@@ -359,7 +360,7 @@
     "id": "5Jxq0pmf6L73"
    },
    "source": [
-    "### **2.2 - Basic completion**"
+    "# **2.1 - Basic completion**"
    ]
   },
   {
@@ -380,7 +381,7 @@
     "id": "StccjUDh6W0Q"
    },
    "source": [
-    "### **2.3 - System prompts**\n"
+    "## **2.2 - System prompts**\n"
    ]
   },
   {
@@ -404,7 +405,7 @@
     "id": "Hp4GNa066pYy"
    },
    "source": [
-    "### **2.4 - Response formats**\n",
+    "### **2.3 - Response formats**\n",
     "* Can support different formatted outputs e.g. text, JSON, etc."
    ]
   },

+ 24 - 34
recipes/llama_api_providers/OctoAI_API_examples/HelloLlamaCloud.ipynb

@@ -6,13 +6,12 @@
    "metadata": {},
    "source": [
     "## This demo app shows:\n",
-    "* How to run Llama2 in the cloud hosted on OctoAI\n",
+    "* How to run Llama 3 in the cloud hosted on OctoAI\n",
     "* How to use LangChain to ask Llama general questions and follow up questions\n",
-    "* How to use LangChain to load a recent PDF doc - the Llama2 paper pdf - and chat about it. This is the well known RAG (Retrieval Augmented Generation) method to let LLM such as Llama2 be able to answer questions about the data not publicly available when Llama2 was trained, or about your own data. RAG is one way to prevent LLM's hallucination\n",
-    "* You should also review the [HelloLlamaLocal](HelloLlamaLocal.ipynb) notebook for more information on RAG\n",
+    "* How to use LangChain to load a recent PDF doc - the Llama paper pdf - and chat about it. This is the well known RAG (Retrieval Augmented Generation) method to let LLM such as Llama be able to answer questions about your own data. RAG is one way to prevent LLM's hallucination\n",
     "\n",
     "**Note** We will be using OctoAI to run the examples here. You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account, then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first).\n",
-    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI."
+    "After the free trial ends, you will need to enter billing info to continue to use Llama 3 hosted on OctoAI."
    ]
   },
   {
@@ -35,7 +34,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install langchain octoai-sdk sentence-transformers chromadb pypdf"
+    "%pip install langchain==0.1.19 octoai-sdk==0.10.1 openai sentence-transformers chromadb pypdf"
    ]
   },
   {
@@ -57,15 +56,17 @@
    "id": "3e8870c1",
    "metadata": {},
    "source": [
-    "Next we call the Llama 2 model from OctoAI. In this example we will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
+    "Next we call the Llama 3 model from OctoAI. In this example we will use the Llama 3 8b instruct model. You can find more on Llama models on the [OctoAI text generation solution page](https://octoai.cloud/text).\n",
     "\n",
     "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
-    "* llama-2-13b-chat\n",
-    "* llama-2-70b-chat\n",
+    "* meta-llama-3-8b-instruct\n",
+    "* meta-llama-3-70b-instruct\n",
     "* codellama-7b-instruct\n",
     "* codellama-13b-instruct\n",
     "* codellama-34b-instruct\n",
-    "* codellama-70b-instruct"
+    "* llama-2-13b-chat\n",
+    "* llama-2-70b-chat\n",
+    "* llamaguard-7b"
    ]
   },
   {
@@ -77,21 +78,11 @@
    "source": [
     "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
     "\n",
-    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
+    "llama3_8b = \"meta-llama-3-8b-instruct\"\n",
     "llm = OctoAIEndpoint(\n",
-    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
-    "    model_kwargs={\n",
-    "        \"model\": llama2_13b,\n",
-    "        \"messages\": [\n",
-    "            {\n",
-    "                \"role\": \"system\",\n",
-    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
-    "            }\n",
-    "        ],\n",
-    "        \"max_tokens\": 500,\n",
-    "        \"top_p\": 1,\n",
-    "        \"temperature\": 0.01\n",
-    "    },\n",
+    "    model=llama3_8b,\n",
+    "    max_tokens=500,\n",
+    "    temperature=0.01\n",
     ")"
    ]
   },
@@ -111,7 +102,7 @@
    "outputs": [],
    "source": [
     "question = \"who wrote the book Innovator's dilemma?\"\n",
-    "answer = llm(question)\n",
+    "answer = llm.invoke(question)\n",
     "print(answer)"
    ]
   },
@@ -134,7 +125,7 @@
    "source": [
     "# chat history not passed so Llama doesn't have the context and doesn't know this is more about the book\n",
     "followup = \"tell me more\"\n",
-    "followup_answer = llm(followup)\n",
+    "followup_answer = llm.invoke(followup)\n",
     "print(followup_answer)"
    ]
   },
@@ -162,7 +153,7 @@
     "memory = ConversationBufferMemory()\n",
     "conversation = ConversationChain(\n",
     "    llm=llm, \n",
-    "    memory = memory,\n",
+    "    memory=memory,\n",
     "    verbose=False\n",
     ")"
    ]
@@ -208,11 +199,10 @@
    "id": "fc436163",
    "metadata": {},
    "source": [
-    "Next, let's explore using Llama 2 to answer questions using documents for context. \n",
-    "This gives us the ability to update Llama 2's knowledge thus giving it better context without needing to finetune. \n",
-    "For a more in-depth study of this, see the notebook on using Llama 2 locally [here](HelloLlamaLocal.ipynb)\n",
+    "Next, let's explore using Llama 3 to answer questions using documents for context. \n",
+    "This gives us the ability to update Llama 3's knowledge thus giving it better context without needing to finetune. \n",
     "\n",
-    "We will use the PyPDFLoader to load in a pdf, in this case, the Llama 2 paper."
+    "We will use the PyPDFLoader to load in a pdf, in this case, the Llama paper."
    ]
   },
   {
@@ -301,7 +291,7 @@
    "id": "54ad02d7",
    "metadata": {},
    "source": [
-    "We then use ` RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.\n",
+    "We then use ` RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama, thereby increasing its knowledge.\n",
     "\n",
     "For each question, LangChain performs a semantic similarity search of it in the vector db, then passes the search results as the context to Llama to answer the question."
    ]
@@ -321,7 +311,7 @@
     "    retriever=vectordb.as_retriever()\n",
     ")\n",
     "\n",
-    "question = \"What is llama2?\"\n",
+    "question = \"What is llama?\"\n",
     "result = qa_chain({\"query\": question})\n",
     "print(result['result'])"
    ]
@@ -344,7 +334,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# no context passed so Llama2 doesn't have enough context to answer so it lets its imagination go wild\n",
+    "# no context passed so Llama doesn't have enough context to answer so it lets its imagination go wild\n",
     "result = qa_chain({\"query\": \"what are its use cases?\"})\n",
     "print(result['result'])"
    ]
@@ -376,7 +366,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# let's ask the original question \"What is llama2?\" again\n",
+    "# let's ask the original question \"What is llama?\" again\n",
     "result = chat_chain({\"question\": question, \"chat_history\": []})\n",
     "print(result['answer'])"
    ]

+ 67 - 143
recipes/llama_api_providers/OctoAI_API_examples/LiveData.ipynb

@@ -7,12 +7,12 @@
    "source": [
     "## This demo app shows:\n",
     "* How to use LlamaIndex, an open source library to help you build custom data augmented LLM applications\n",
-    "* How to ask Llama questions about recent live data via the You.com live search API and LlamaIndex\n",
+    "* How to ask Llama 3 questions about recent live data via the Tavily live search API\n",
     "\n",
-    "The LangChain package is used to facilitate the call to Llama2 hosted on OctoAI\n",
+    "The LangChain package is used to facilitate the call to Llama 3 hosted on OctoAI\n",
     "\n",
     "**Note** We will be using OctoAI to run the examples here. You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account, then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first).\n",
-    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI."
+    "After the free trial ends, you will need to enter billing info to continue to use Llama3 hosted on OctoAI."
    ]
   },
   {
@@ -32,23 +32,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install llama-index langchain"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "21fe3849",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# use ServiceContext to configure the LLM used and the custom embeddings\n",
-    "from llama_index import ServiceContext\n",
-    "\n",
-    "# VectorStoreIndex is used to index custom data \n",
-    "from llama_index import VectorStoreIndex\n",
-    "\n",
-    "from langchain.llms.octoai_endpoint import OctoAIEndpoint"
+    "!pip install llama-index \n",
+    "!pip install llama-index-core\n",
+    "!pip install llama-index-llms-octoai\n",
+    "!pip install llama-index-embeddings-octoai\n",
+    "!pip install octoai-sdk\n",
+    "!pip install tavily-python\n",
+    "!pip install replicate"
    ]
   },
   {
@@ -75,227 +65,161 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f8ff812b",
-   "metadata": {},
-   "source": [
-    "In this example we will use the [YOU.com](https://you.com/) search engine to augment the LLM's responses.\n",
-    "To use the You.com Search API, you can email api@you.com to request an API key. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "75275628-5235-4b55-8033-601c76107528",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "YOUCOM_API_KEY = getpass()\n",
-    "os.environ[\"YOUCOM_API_KEY\"] = YOUCOM_API_KEY"
-   ]
-  },
-  {
-   "cell_type": "markdown",
    "id": "cb210c7c",
    "metadata": {},
    "source": [
-    "We then call the Llama 2 model from OctoAI.\n",
+    "We then call the Llama 3 model from OctoAI.\n",
     "\n",
-    "We will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
+    "We will use the Llama 3 8b instruct model. You can find more on Llama models on the [OctoAI text generation solution page](https://octoai.cloud/text).\n",
     "\n",
     "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
-    "* llama-2-13b-chat\n",
-    "* llama-2-70b-chat\n",
+    "* meta-llama-3-8b-instruct\n",
+    "* meta-llama-3-70b-instruct\n",
     "* codellama-7b-instruct\n",
     "* codellama-13b-instruct\n",
     "* codellama-34b-instruct\n",
-    "* codellama-70b-instruct"
+    "* llama-2-13b-chat\n",
+    "* llama-2-70b-chat\n",
+    "* llamaguard-7b"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c12fc2cb",
+   "id": "21fe3849",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# set llm to be using Llama2 hosted on OctoAI\n",
-    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
+    "# use ServiceContext to configure the LLM used and the custom embeddings\n",
+    "from llama_index.core import ServiceContext\n",
     "\n",
-    "llm = OctoAIEndpoint(\n",
-    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
-    "    model_kwargs={\n",
-    "        \"model\": llama2_13b,\n",
-    "        \"messages\": [\n",
-    "            {\n",
-    "                \"role\": \"system\",\n",
-    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
-    "            }\n",
-    "        ],\n",
-    "        \"max_tokens\": 500,\n",
-    "        \"top_p\": 1,\n",
-    "        \"temperature\": 0.01\n",
-    "    },\n",
-    ")"
+    "# VectorStoreIndex is used to index custom data \n",
+    "from llama_index.core import VectorStoreIndex\n",
+    "\n",
+    "from llama_index.core import Settings, VectorStoreIndex\n",
+    "from llama_index.embeddings.octoai import OctoAIEmbedding\n",
+    "from llama_index.llms.octoai import OctoAI\n",
+    "\n",
+    "Settings.llm = OctoAI(\n",
+    "    model=\"meta-llama-3-8b-instruct\",\n",
+    "    token=OCTOAI_API_TOKEN,\n",
+    "    temperature=0.0,\n",
+    "    max_tokens=128,\n",
+    ")\n",
+    "\n",
+    "Settings.embed_model = OctoAIEmbedding(api_key=OCTOAI_API_TOKEN)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "476d72da",
+   "id": "f8ff812b",
    "metadata": {},
    "source": [
-    "Using our api key we set up earlier, we make a request from YOU.com for live data on a particular topic."
+    "Next you will use the [Tavily](https://tavily.com/) search engine to augment the Llama 3's responses. To create a free trial Tavily Search API, sign in with your Google or Github account [here](https://app.tavily.com/sign-in)."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "effc9656-b18d-4d24-a80b-6066564a838b",
+   "id": "75275628-5235-4b55-8033-601c76107528",
    "metadata": {},
    "outputs": [],
    "source": [
-    "import requests\n",
+    "from tavily import TavilyClient\n",
     "\n",
-    "query = \"Meta Connect\" # you can try other live data query about sports score, stock market and weather info \n",
-    "headers = {\"X-API-Key\": os.environ[\"YOUCOM_API_KEY\"]}\n",
-    "data = requests.get(\n",
-    "    f\"https://api.ydc-index.io/search?query={query}\",\n",
-    "    headers=headers,\n",
-    ").json()"
+    "TAVILY_API_KEY = getpass()\n",
+    "tavily = TavilyClient(api_key=TAVILY_API_KEY)"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8bed3baf-742e-473c-ada1-4459012a8a2c",
+   "cell_type": "markdown",
+   "id": "476d72da",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "# check the query result in JSON\n",
-    "import json\n",
-    "\n",
-    "print(json.dumps(data, indent=2))"
+    "Do a live web search on \"Llama 3 fine-tuning\"."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "b196e697",
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "effc9656-b18d-4d24-a80b-6066564a838b",
    "metadata": {},
+   "outputs": [],
    "source": [
-    "We then use the [`JSONLoader`](https://llamahub.ai/l/file-json) to extract the text from the returned data. The `JSONLoader` gives us the ability to load the data into LamaIndex.\n",
-    "In the next cell we show how to load the JSON result with key info stored as \"snippets\".\n",
-    "\n",
-    "However, you can also add the snippets in the query result to documents like below:\n",
-    "```python \n",
-    "from llama_index import Document\n",
-    "snippets = [snippet for hit in data[\"hits\"] for snippet in hit[\"snippets\"]]\n",
-    "documents = [Document(text=s) for s in snippets]\n",
-    "```\n",
-    "This can be handy if you just need to add a list of text strings to doc"
+    "response = tavily.search(query=\"Llama 3 fine-tuning\")\n",
+    "context = [{\"url\": obj[\"url\"], \"content\": obj[\"content\"]} for obj in response['results']]"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7c40e73f-ca13-4f4a-a753-e613df3d389e",
+   "id": "6b5af98b-c26b-4fd7-8031-31ac4915cdac",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# one way to load the JSON result with key info stored as \"snippets\"\n",
-    "from llama_index import download_loader\n",
-    "\n",
-    "JsonDataReader = download_loader(\"JsonDataReader\")\n",
-    "loader = JsonDataReader()\n",
-    "documents = loader.load_data([hit[\"snippets\"] for hit in data[\"hits\"]])\n"
+    "context"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8e5e3b4e",
+   "id": "0f4ea96b-bb00-4a1f-8bd2-7f15237415f6",
    "metadata": {},
    "source": [
-    "With the data set up, we create a vector store for the data and a query engine for it.\n",
-    "\n",
-    "For our embeddings we will use `OctoAIEmbeddings` whose default embedding model is GTE-Large. This model provides a good balance between speed and performance.\n",
-    "\n",
-    "For more info see https://octoai.cloud/tools/text/embeddings?mode=demo&model=thenlper%2Fgte-large. "
+    "Create documents based on the search results, index and save them to a vector store, then create a query engine."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a5de3080-2c4b-479c-baba-793b3bee36ed",
+   "id": "7513ac70-155a-4d56-b326-0e8c2733ab99",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# use OctoAI embeddings \n",
-    "from langchain_community.embeddings import OctoAIEmbeddings\n",
-    "from llama_index.embeddings import LangchainEmbedding\n",
-    "\n",
-    "\n",
-    "embeddings = LangchainEmbedding(OctoAIEmbeddings(\n",
-    "    endpoint_url=\"https://text.octoai.run/v1/embeddings\"\n",
-    "))\n",
-    "print(embeddings)\n",
-    "\n",
-    "# create a ServiceContext instance to use Llama2 and custom embeddings\n",
-    "service_context = ServiceContext.from_defaults(llm=llm, chunk_size=800, chunk_overlap=20, embed_model=embeddings)\n",
+    "from llama_index.core import Document\n",
     "\n",
-    "# create vector store index from the documents created above\n",
-    "index = VectorStoreIndex.from_documents(documents, service_context=service_context)\n",
+    "documents = [Document(text=ct['content']) for ct in context]\n",
+    "index = VectorStoreIndex.from_documents(documents)\n",
     "\n",
-    "# create query engine from the index\n",
-    "query_engine = index.as_query_engine(streaming=False)"
+    "query_engine = index.as_query_engine(streaming=True)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2c4ea012",
+   "id": "df743c62-165c-4834-b1f1-7d7848a6815e",
    "metadata": {},
    "source": [
-    "We are now ready to ask Llama 2 a question about the live data using our query engine."
+    "You are now ready to ask Llama 3 questions about the live data using the query engine."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "de91a191-d0f2-498e-88dc-b2b43423e0e5",
+   "id": "b2fd905b-575a-45f1-88da-9b093caa232a",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# ask Llama2 a summary question about the search result\n",
     "response = query_engine.query(\"give me a summary\")\n",
-    "print(str(response))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "72814b20-06aa-4da8-b4dd-f0b0d74a2ea0",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# more questions\n",
-    "print(str(query_engine.query(\"what products were announced\")))"
+    "response.print_response_stream()"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a65bc037-a689-476d-b529-0059a27bc949",
+   "id": "88c45380-1d00-46d5-80ac-0eff68fd1f8a",
    "metadata": {},
    "outputs": [],
    "source": [
-    "print(str(query_engine.query(\"tell me more about Meta AI assistant\")))"
+    "query_engine.query(\"what's the latest about Llama 3 fine-tuning?\").print_response_stream()"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "16a56542",
+   "id": "0fe54976-5345-4426-a6f0-dc3bfd45dac3",
    "metadata": {},
    "outputs": [],
    "source": [
-    "print(str(query_engine.query(\"what are Generative AI stickers\")))"
+    "query_engine.query(\"tell me more about Llama 3 fine-tuning\").print_response_stream()"
    ]
   }
  ],