|  | @@ -29,14 +29,14 @@
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "This notebook will jump right in and show you what's the latest with our models, how to use get the best out of them.\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  | -    "1. [Environment Setup](#env)\n",
 | 
	
		
			
				|  |  | -    "2. [Loading the model](#load)\n",
 | 
	
		
			
				|  |  | -    "3. [Long Context Demo](#longctx)\n",
 | 
	
		
			
				|  |  | -    "4. [Text Conversations](#text)\n",
 | 
	
		
			
				|  |  | -    "5. [Multilingual](#mling)\n",
 | 
	
		
			
				|  |  | -    "6. [Multimodal: Single Image Understanding](#mm)\n",
 | 
	
		
			
				|  |  | -    "7. [Multimodal: Multi Image Understanding](#mm2)\n",
 | 
	
		
			
				|  |  | -    "8. [Function Calling with Image Understanding](#fc)"
 | 
	
		
			
				|  |  | +    "1. Environment Setup\n",
 | 
	
		
			
				|  |  | +    "2. Loading the model\n",
 | 
	
		
			
				|  |  | +    "3. Long Context Demo\n",
 | 
	
		
			
				|  |  | +    "4. Text Conversations\n",
 | 
	
		
			
				|  |  | +    "5. Multilingual\n",
 | 
	
		
			
				|  |  | +    "6. Multimodal: Single Image Understanding\n",
 | 
	
		
			
				|  |  | +    "7. Multimodal: Multi Image Understanding\n",
 | 
	
		
			
				|  |  | +    "8. Function Calling with Image Understanding"
 | 
	
		
			
				|  |  |     ]
 | 
	
		
			
				|  |  |    },
 | 
	
		
			
				|  |  |    {
 | 
	
	
		
			
				|  | @@ -46,7 +46,6 @@
 | 
	
		
			
				|  |  |      "jp-MarkdownHeadingCollapsed": true
 | 
	
		
			
				|  |  |     },
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='env'></a>\n",
 | 
	
		
			
				|  |  |      "## Environment Setup:\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "* You'll need at least 4 GPUs with >= 80GB each.\n",
 | 
	
	
		
			
				|  | @@ -72,7 +71,6 @@
 | 
	
		
			
				|  |  |     "id": "2fcf2b8b-5274-4a85-bec9-03ef99b20ce9",
 | 
	
		
			
				|  |  |     "metadata": {},
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='longctx'></a>\n",
 | 
	
		
			
				|  |  |      "## Long Context Demo: Write a guide on SAM-2 based on the repo\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "Scout supports upto 10M context. On 8xH100, in bf16 you can get upto 1.4M tokens. We recommend using `vllm` for fast inference. \n",
 | 
	
	
		
			
				|  | @@ -991,7 +989,6 @@
 | 
	
		
			
				|  |  |     "id": "17124706-e6b1-4e2a-b8a1-19e78243c5ac",
 | 
	
		
			
				|  |  |     "metadata": {},
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='text'></a>\n",
 | 
	
		
			
				|  |  |      "## Text Conversations\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "Llama 4 Scout continues to be a great conversationalist and can respond in various styles."
 | 
	
	
		
			
				|  | @@ -1074,7 +1071,6 @@
 | 
	
		
			
				|  |  |     "id": "9c16037c-ea39-421d-b13b-853fa1db3858",
 | 
	
		
			
				|  |  |     "metadata": {},
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='mling'></a>\n",
 | 
	
		
			
				|  |  |      "## Multilingual\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "Llama 4 Scout is fluent in 12 languages: \n",
 | 
	
	
		
			
				|  | @@ -1135,7 +1131,6 @@
 | 
	
		
			
				|  |  |     "id": "c4a5f841-aceb-43c6-9db7-b3f8e010a13b",
 | 
	
		
			
				|  |  |     "metadata": {},
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='mm'></a>\n",
 | 
	
		
			
				|  |  |      "## Multimodal\n",
 | 
	
		
			
				|  |  |      "Llama 4 Scout excels at image understanding. Note that the Llama models officially support only English for image-understanding.\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
	
		
			
				|  | @@ -1187,7 +1182,6 @@
 | 
	
		
			
				|  |  |     "id": "6f058767-d415-4c8c-9019-387b0adacc8e",
 | 
	
		
			
				|  |  |     "metadata": {},
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='mm1'></a>\n",
 | 
	
		
			
				|  |  |      "### Multimodal: Understanding a Single Image\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "Here's an example with 1 image:"
 | 
	
	
		
			
				|  | @@ -1262,7 +1256,6 @@
 | 
	
		
			
				|  |  |     "id": "df47b0d1-0cd9-4437-b8b2-7cefa1e189a7",
 | 
	
		
			
				|  |  |     "metadata": {},
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='mm2'></a>\n",
 | 
	
		
			
				|  |  |      "### Multimodal: Understanding Multiple Images\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "Llama 4 Scout can process information from multiple images - the number of images you can pass in a single request is only limited by the available memory. To prevent OOM errors, try downsizing the images before passing it to the model. "
 | 
	
	
		
			
				|  | @@ -1350,7 +1343,6 @@
 | 
	
		
			
				|  |  |     "id": "b472898e-9ffa-429e-b64e-d31c0ebdd3a6",
 | 
	
		
			
				|  |  |     "metadata": {},
 | 
	
		
			
				|  |  |     "source": [
 | 
	
		
			
				|  |  | -    "<a id='fc'></a>\n",
 | 
	
		
			
				|  |  |      "## Function Calling with Image Understanding\n",
 | 
	
		
			
				|  |  |      "\n",
 | 
	
		
			
				|  |  |      "Function calling now works natively with images, i.e. the model can understand the images and return the appropriate function-call. In this example, we ask Llama to book us tickets to the place shown in the photos."
 |