1 год назад · 2d41a3ec8a
--- a/getting-started/build_with_llama_4.ipynb
+++ b/getting-started/build_with_llama_4.ipynb
@@ -29,14 +29,14 @@
 
				     "\n",
			
 
				     "This notebook will jump right in and show you what's the latest with our models, how to use get the best out of them.\n",
			
 
				     "\n",
			
 
				-    "1. [Environment Setup](#env)\n",
			
 
				-    "2. [Loading the model](#load)\n",
			
 
				-    "3. [Long Context Demo](#longctx)\n",
			
 
				-    "4. [Text Conversations](#text)\n",
			
 
				-    "5. [Multilingual](#mling)\n",
			
 
				-    "6. [Multimodal: Single Image Understanding](#mm)\n",
			
 
				-    "7. [Multimodal: Multi Image Understanding](#mm2)\n",
			
 
				-    "8. [Function Calling with Image Understanding](#fc)"
			
 
				+    "1. Environment Setup\n",
			
 
				+    "2. Loading the model\n",
			
 
				+    "3. Long Context Demo\n",
			
 
				+    "4. Text Conversations\n",
			
 
				+    "5. Multilingual\n",
			
 
				+    "6. Multimodal: Single Image Understanding\n",
			
 
				+    "7. Multimodal: Multi Image Understanding\n",
			
 
				+    "8. Function Calling with Image Understanding"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -46,7 +46,6 @@
 
				     "jp-MarkdownHeadingCollapsed": true
			
 
				    },
			
 
				    "source": [
			
 
				-    "<a id='env'></a>\n",
			
 
				     "## Environment Setup:\n",
			
 
				     "\n",
			
 
				     "* You'll need at least 4 GPUs with >= 80GB each.\n",
			
@@ -72,7 +71,6 @@
 
				    "id": "2fcf2b8b-5274-4a85-bec9-03ef99b20ce9",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "<a id='longctx'></a>\n",
			
 
				     "## Long Context Demo: Write a guide on SAM-2 based on the repo\n",
			
 
				     "\n",
			
 
				     "Scout supports upto 10M context. On 8xH100, in bf16 you can get upto 1.4M tokens. We recommend using `vllm` for fast inference. \n",
			
@@ -991,7 +989,6 @@
 
				    "id": "17124706-e6b1-4e2a-b8a1-19e78243c5ac",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "<a id='text'></a>\n",
			
 
				     "## Text Conversations\n",
			
 
				     "\n",
			
 
				     "Llama 4 Scout continues to be a great conversationalist and can respond in various styles."
			
@@ -1074,7 +1071,6 @@
 
				    "id": "9c16037c-ea39-421d-b13b-853fa1db3858",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "<a id='mling'></a>\n",
			
 
				     "## Multilingual\n",
			
 
				     "\n",
			
 
				     "Llama 4 Scout is fluent in 12 languages: \n",
			
@@ -1135,7 +1131,6 @@
 
				    "id": "c4a5f841-aceb-43c6-9db7-b3f8e010a13b",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "<a id='mm'></a>\n",
			
 
				     "## Multimodal\n",
			
 
				     "Llama 4 Scout excels at image understanding. Note that the Llama models officially support only English for image-understanding.\n",
			
 
				     "\n",
			
@@ -1187,7 +1182,6 @@
 
				    "id": "6f058767-d415-4c8c-9019-387b0adacc8e",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "<a id='mm1'></a>\n",
			
 
				     "### Multimodal: Understanding a Single Image\n",
			
 
				     "\n",
			
 
				     "Here's an example with 1 image:"
			
@@ -1262,7 +1256,6 @@
 
				    "id": "df47b0d1-0cd9-4437-b8b2-7cefa1e189a7",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "<a id='mm2'></a>\n",
			
 
				     "### Multimodal: Understanding Multiple Images\n",
			
 
				     "\n",
			
 
				     "Llama 4 Scout can process information from multiple images - the number of images you can pass in a single request is only limited by the available memory. To prevent OOM errors, try downsizing the images before passing it to the model. "
			
@@ -1350,7 +1343,6 @@
 
				    "id": "b472898e-9ffa-429e-b64e-d31c0ebdd3a6",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "<a id='fc'></a>\n",
			
 
				     "## Function Calling with Image Understanding\n",
			
 
				     "\n",
			
 
				     "Function calling now works natively with images, i.e. the model can understand the images and return the appropriate function-call. In this example, we ask Llama to book us tickets to the place shown in the photos."