|
@@ -7,8 +7,8 @@
|
|
|
"source": [
|
|
|
"## This demo app shows:\n",
|
|
|
"* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video\n",
|
|
|
- "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
|
|
|
- "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
|
|
|
+ "* How to ask Llama 3 to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
|
|
|
+ "* How to bypass the limit of Llama 3's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -22,7 +22,7 @@
|
|
|
"- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer\n",
|
|
|
"- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos\n",
|
|
|
"\n",
|
|
|
- "**Note** This example uses OctoAI to host the Llama model. If you have not set up/or used OctoAI before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up OctoAI before continuing with this example.\n",
|
|
|
+ "**Note** This example uses OctoAI to host the Llama 3 model. If you have not set up/or used OctoAI before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up OctoAI before continuing with this example.\n",
|
|
|
"If you do not want to use OctoAI, you will need to make some changes to this notebook as you go along."
|
|
|
]
|
|
|
},
|
|
@@ -33,7 +33,7 @@
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
|
- "!pip install langchain octoai-sdk youtube-transcript-api tiktoken pytube"
|
|
|
+ "!pip install langchain==0.1.19 youtube-transcript-api tiktoken pytube"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -41,7 +41,7 @@
|
|
|
"id": "af3069b1",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "Let's load the YouTube video transcript using the YoutubeLoader."
|
|
|
+ "Let's first load a long (2:47:16) YouTube video (Lex Fridman with Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI) transcript using the YoutubeLoader."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -54,7 +54,7 @@
|
|
|
"from langchain.document_loaders import YoutubeLoader\n",
|
|
|
"\n",
|
|
|
"loader = YoutubeLoader.from_youtube_url(\n",
|
|
|
- " \"https://www.youtube.com/watch?v=1k37OcjH7BM\", add_video_info=True\n",
|
|
|
+ " \"https://www.youtube.com/watch?v=5t1vTLU7s40\", add_video_info=True\n",
|
|
|
")"
|
|
|
]
|
|
|
},
|
|
@@ -85,17 +85,16 @@
|
|
|
"id": "4af7cc16",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "We are using OctoAI in this example to host our Llama 2 model so you will need to get a OctoAI token.\n",
|
|
|
+ "You should see 142689 returned for the doc character length, which is about 30k words or 40k tokens, beyond the 8k context length limit of Llama 3. You'll see how to summarize a text longer than the limit.\n",
|
|
|
+ "\n",
|
|
|
+ "**Note**: We are using OctoAI in this example to host our Llama 3 model so you will need to get a OctoAI token.\n",
|
|
|
"\n",
|
|
|
"To get the OctoAI token:\n",
|
|
|
"\n",
|
|
|
"- You will need to first sign in with OctoAI with your github account\n",
|
|
|
"- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)\n",
|
|
|
"\n",
|
|
|
- "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI.\n",
|
|
|
- "\n",
|
|
|
- "Alternatively, you can run Llama locally. See:\n",
|
|
|
- "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
|
|
|
+ "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -118,17 +117,17 @@
|
|
|
"id": "6b911efd",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "Next we call the Llama 2 model from OctoAI. In this example we will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
|
|
|
+ "Next we call the Llama 3 model from OctoAI. In this example we will use the Llama 3 8b instruct model. You can find more on Llama models on the [OctoAI text generation solution page](https://octoai.cloud/text).\n",
|
|
|
"\n",
|
|
|
"At the time of writing this notebook the following Llama models are available on OctoAI:\n",
|
|
|
- "* llama-2-13b-chat\n",
|
|
|
- "* llama-2-70b-chat\n",
|
|
|
+ "* meta-llama-3-8b-instruct\n",
|
|
|
+ "* meta-llama-3-70b-instruct\n",
|
|
|
"* codellama-7b-instruct\n",
|
|
|
"* codellama-13b-instruct\n",
|
|
|
"* codellama-34b-instruct\n",
|
|
|
- "* codellama-70b-instruct\n",
|
|
|
- "\n",
|
|
|
- "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
|
|
|
+ "* llama-2-13b-chat\n",
|
|
|
+ "* llama-2-70b-chat\n",
|
|
|
+ "* llamaguard-7b"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -140,21 +139,11 @@
|
|
|
"source": [
|
|
|
"from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
|
|
|
"\n",
|
|
|
- "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
|
|
|
+ "llama3_8b = \"meta-llama-3-8b-instruct\"\n",
|
|
|
"llm = OctoAIEndpoint(\n",
|
|
|
- " endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
|
|
|
- " model_kwargs={\n",
|
|
|
- " \"model\": llama2_13b,\n",
|
|
|
- " \"messages\": [\n",
|
|
|
- " {\n",
|
|
|
- " \"role\": \"system\",\n",
|
|
|
- " \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
|
|
|
- " }\n",
|
|
|
- " ],\n",
|
|
|
- " \"max_tokens\": 500,\n",
|
|
|
- " \"top_p\": 1,\n",
|
|
|
- " \"temperature\": 0.01\n",
|
|
|
- " },\n",
|
|
|
+ " model=llama3_8b,\n",
|
|
|
+ " max_tokens=500,\n",
|
|
|
+ " temperature=0.01\n",
|
|
|
")"
|
|
|
]
|
|
|
},
|
|
@@ -163,7 +152,7 @@
|
|
|
"id": "8e3baa56",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
|
|
|
+ "Once everything is set up, we prompt Llama 3 to summarize the first 4000 characters of the transcript for us."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -173,90 +162,74 @@
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
|
- "from langchain.prompts import ChatPromptTemplate\n",
|
|
|
+ "from langchain.prompts import PromptTemplate\n",
|
|
|
"from langchain.chains import LLMChain\n",
|
|
|
- "prompt = ChatPromptTemplate.from_template(\n",
|
|
|
- " \"Give me a summary of the text below: {text}?\"\n",
|
|
|
+ "\n",
|
|
|
+ "prompt_template = \"Give me a summary of the text below: {text}?\"\n",
|
|
|
+ "prompt = PromptTemplate(\n",
|
|
|
+ " input_variables=[\"text\"], template=prompt_template\n",
|
|
|
")\n",
|
|
|
- "chain = LLMChain(llm=llm, prompt=prompt)\n",
|
|
|
+ "chain = prompt | llm\n",
|
|
|
+ "\n",
|
|
|
"# be careful of the input text length sent to LLM\n",
|
|
|
- "text = docs[0].page_content[:4000]\n",
|
|
|
- "summary = chain.run(text)\n",
|
|
|
- "# this is the summary of the first 4000 characters of the video content\n",
|
|
|
+ "text = docs[0].page_content[:10000]\n",
|
|
|
+ "summary = chain.invoke(text)\n",
|
|
|
+ "\n",
|
|
|
+ "# Note: The context length of 8k tokens in Llama 3 is roughly 6000-7000 words or 32k characters\n",
|
|
|
"print(summary)"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
- "id": "8b684b29",
|
|
|
+ "id": "1ad1881a",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
|
|
|
+ "If you try the whole content which has over 142k characters, about 40k tokens, which exceeds the 8k limit, you'll get an empty result (OctoAI used to return an error \"BadRequestError: The token count (32704) of your prompt (32204) + your setting of `max_tokens` (500) cannot exceed this model's context length (8192).\")."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
- "id": "88a2c17f",
|
|
|
+ "id": "61a088b7-cba2-4603-ba7c-f6673bfaa3cd",
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
|
- "# try to get a summary of the whole content\n",
|
|
|
+ "# this will generate an empty result because the input exceeds Llama 3's context length limit\n",
|
|
|
"text = docs[0].page_content\n",
|
|
|
- "summary = chain.run(text)\n",
|
|
|
+ "summary = llm.invoke(f\"Give me a summary of the text below: {text}.\")\n",
|
|
|
"print(summary)"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
- "id": "1ad1881a",
|
|
|
+ "id": "e112845f-de16-4c2f-8afe-6cca31f6fa38",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
+ "To fix this, you can use LangChain's load_summarize_chain method (detail [here](https://python.langchain.com/docs/use_cases/summarization)).\n",
|
|
|
"\n",
|
|
|
- "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n",
|
|
|
+ "First you'll create splits or sub-documents of the original content, then use the LangChain's `load_summarize_chain` with the `refine` or `map_reduce type`.\n",
|
|
|
"\n",
|
|
|
- "We will use the LangChain's `load_summarize_chain` and play around with the `chain_type`.\n"
|
|
|
+ "Because this may involve many calls to Llama 3, it'd be great to set up a quick free LangChain API key [here](https://smith.langchain.com/settings), run the following cell to set up necessary environment variables, and check the logs on [LangSmith](https://docs.smith.langchain.com/) during and after the run."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
- "id": "9bfee2d3-3afe-41d9-8968-6450cc23f493",
|
|
|
+ "id": "55586a09-db53-4741-87d8-fdfb40d9f8cb",
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
|
- "from langchain.chains.summarize import load_summarize_chain\n",
|
|
|
- "# see https://python.langchain.com/docs/use_cases/summarization for more info\n",
|
|
|
- "chain = load_summarize_chain(llm, chain_type=\"stuff\") # other supported methods are map_reduce and refine\n",
|
|
|
- "chain.run(docs)\n",
|
|
|
- "# same RuntimeError: Your input is too long. but stuff works for shorter text with input length <= 4096 tokens"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "682799a8-3846-41b1-a908-02ab5ac3ecee",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": [
|
|
|
- "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
|
|
|
- "# still get the \"RuntimeError: Your input is too long. Max input length is 4096 tokens\"\n",
|
|
|
- "chain.run(docs)"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "aecf6328",
|
|
|
- "metadata": {},
|
|
|
- "source": [
|
|
|
- "\n",
|
|
|
- "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` to iteratively create an answer."
|
|
|
+ "import os\n",
|
|
|
+ "os.environ[\"LANGCHAIN_API_KEY\"] = \"your_langchain_api_key\"\n",
|
|
|
+ "os.environ[\"LANGCHAIN_API_KEY\"] = \"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\"\n",
|
|
|
+ "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
|
|
+ "os.environ[\"LANGCHAIN_PROJECT\"] = \"Video Summary with Llama 3\""
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
- "id": "3be1236a-fe6a-4bf6-983f-0e72dde39fee",
|
|
|
+ "id": "9bfee2d3-3afe-41d9-8968-6450cc23f493",
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
@@ -264,7 +237,7 @@
|
|
|
"\n",
|
|
|
"# we need to split the long input text\n",
|
|
|
"text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
|
|
|
- " chunk_size=3000, chunk_overlap=0\n",
|
|
|
+ " chunk_size=1000, chunk_overlap=0\n",
|
|
|
")\n",
|
|
|
"split_docs = text_splitter.split_documents(docs)"
|
|
|
]
|
|
@@ -272,7 +245,7 @@
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
- "id": "12ae9e9d-3434-4a84-a298-f2b98de9ff01",
|
|
|
+ "id": "682799a8-3846-41b1-a908-02ab5ac3ecee",
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
@@ -281,81 +254,61 @@
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "127f17fe-d5b7-43af-bd2f-2b47b076d0b1",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": [
|
|
|
- "# now get the summary of the whole docs - the whole youtube content\n",
|
|
|
- "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
|
|
|
- "print(str(chain.run(split_docs)))"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
"cell_type": "markdown",
|
|
|
- "id": "c3976c92",
|
|
|
+ "id": "aecf6328",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
|
|
|
+ "The `refine` type implements the following steps under the hood:\n",
|
|
|
+ "\n",
|
|
|
+ "1. Call Llama 3 on the first sub-document to generate a concise summary;\n",
|
|
|
+ "2. Loop over each subsequent sub-document, pass the previous summary with the current sub-document to generate a refined new summary;\n",
|
|
|
+ "3. Return the final summary generated on the final sub-document as the final answer - the summary of the whole content.\n",
|
|
|
+ "\n",
|
|
|
+ "An example prompt template for each call in step 2, which gets used under the hood by LangChain, is:\n",
|
|
|
+ "\n",
|
|
|
+ "```\n",
|
|
|
+ "Your job is to produce a final summary.\n",
|
|
|
+ "We have provided an existing summary up to a certain point:\n",
|
|
|
+ "<previous_summary>\n",
|
|
|
+ "Refine the existing summary (only if needed) with some more content below:\n",
|
|
|
+ "<new_content>\n",
|
|
|
+ "```\n",
|
|
|
+ "\n",
|
|
|
+ "**Note**: The following call will make 33 calls to Llama 3 and genereate the final summary in about 10 minutes."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
- "id": "8991df49-8578-46de-8b30-cb2cd11e30f1",
|
|
|
+ "id": "3be1236a-fe6a-4bf6-983f-0e72dde39fee",
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
|
- "# another method is map_reduce\n",
|
|
|
- "chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
|
|
|
- "print(str(chain.run(split_docs)))"
|
|
|
+ "from langchain.chains.summarize import load_summarize_chain\n",
|
|
|
+ "\n",
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
|
|
|
+ "print(chain.run(split_docs))"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
- "id": "77d580de",
|
|
|
+ "id": "752f2b71-5fd6-4a8a-ac09-371bce1db703",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
|
|
|
- "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "f2138911-d2b9-41f3-870f-9bc37e2043d9",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": [
|
|
|
- "# to find how many calls to Llama have been made and the details of inputs and outputs of each call, set langchain to debug\n",
|
|
|
- "import langchain\n",
|
|
|
- "langchain.debug = True\n",
|
|
|
+ "You can also set `chain_type` to `map_reduce` to generate the summary of the entire content using the standard map and reduce method, which works behind the scene by first mapping each split document to a sub-summary via a call to LLM, then combines all those sub-summaries into a single final summary by yet another call to LLM.\n",
|
|
|
"\n",
|
|
|
- "# stuff method will cause the error in the end\n",
|
|
|
- "chain = load_summarize_chain(llm, chain_type=\"stuff\")\n",
|
|
|
- "chain.run(split_docs)"
|
|
|
+ "**Note**: The following call takes about 3 minutes and all the calls to Llama 3."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
|
- "id": "60d1a531-ab48-45cc-a7de-59a14e18240d",
|
|
|
+ "id": "8991df49-8578-46de-8b30-cb2cd11e30f1",
|
|
|
"metadata": {},
|
|
|
"outputs": [],
|
|
|
"source": [
|
|
|
- "# but refine works\n",
|
|
|
- "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
|
|
|
- "chain.run(split_docs)"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
|
|
|
- "metadata": {},
|
|
|
- "source": [
|
|
|
- "\n",
|
|
|
- "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
|
|
|
+ "print(chain.run(split_docs))"
|
|
|
]
|
|
|
}
|
|
|
],
|