|
@@ -29,14 +29,14 @@
|
|
|
"\n",
|
|
|
"This notebook will jump right in and show you what's the latest with our models, how to use get the best out of them.\n",
|
|
|
"\n",
|
|
|
- "1. [Environment Setup](#env)\n",
|
|
|
- "2. [Loading the model](#load)\n",
|
|
|
- "3. [Long Context Demo](#longctx)\n",
|
|
|
- "4. [Text Conversations](#text)\n",
|
|
|
- "5. [Multilingual](#mling)\n",
|
|
|
- "6. [Multimodal: Single Image Understanding](#mm)\n",
|
|
|
- "7. [Multimodal: Multi Image Understanding](#mm2)\n",
|
|
|
- "8. [Function Calling with Image Understanding](#fc)"
|
|
|
+ "1. Environment Setup\n",
|
|
|
+ "2. Loading the model\n",
|
|
|
+ "3. Long Context Demo\n",
|
|
|
+ "4. Text Conversations\n",
|
|
|
+ "5. Multilingual\n",
|
|
|
+ "6. Multimodal: Single Image Understanding\n",
|
|
|
+ "7. Multimodal: Multi Image Understanding\n",
|
|
|
+ "8. Function Calling with Image Understanding"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -46,7 +46,6 @@
|
|
|
"jp-MarkdownHeadingCollapsed": true
|
|
|
},
|
|
|
"source": [
|
|
|
- "<a id='env'></a>\n",
|
|
|
"## Environment Setup:\n",
|
|
|
"\n",
|
|
|
"* You'll need at least 4 GPUs with >= 80GB each.\n",
|
|
@@ -72,7 +71,6 @@
|
|
|
"id": "2fcf2b8b-5274-4a85-bec9-03ef99b20ce9",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "<a id='longctx'></a>\n",
|
|
|
"## Long Context Demo: Write a guide on SAM-2 based on the repo\n",
|
|
|
"\n",
|
|
|
"Scout supports upto 10M context. On 8xH100, in bf16 you can get upto 1.4M tokens. We recommend using `vllm` for fast inference. \n",
|
|
@@ -991,7 +989,6 @@
|
|
|
"id": "17124706-e6b1-4e2a-b8a1-19e78243c5ac",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "<a id='text'></a>\n",
|
|
|
"## Text Conversations\n",
|
|
|
"\n",
|
|
|
"Llama 4 Scout continues to be a great conversationalist and can respond in various styles."
|
|
@@ -1074,7 +1071,6 @@
|
|
|
"id": "9c16037c-ea39-421d-b13b-853fa1db3858",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "<a id='mling'></a>\n",
|
|
|
"## Multilingual\n",
|
|
|
"\n",
|
|
|
"Llama 4 Scout is fluent in 12 languages: \n",
|
|
@@ -1135,7 +1131,6 @@
|
|
|
"id": "c4a5f841-aceb-43c6-9db7-b3f8e010a13b",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "<a id='mm'></a>\n",
|
|
|
"## Multimodal\n",
|
|
|
"Llama 4 Scout excels at image understanding. Note that the Llama models officially support only English for image-understanding.\n",
|
|
|
"\n",
|
|
@@ -1187,7 +1182,6 @@
|
|
|
"id": "6f058767-d415-4c8c-9019-387b0adacc8e",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "<a id='mm1'></a>\n",
|
|
|
"### Multimodal: Understanding a Single Image\n",
|
|
|
"\n",
|
|
|
"Here's an example with 1 image:"
|
|
@@ -1262,7 +1256,6 @@
|
|
|
"id": "df47b0d1-0cd9-4437-b8b2-7cefa1e189a7",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "<a id='mm2'></a>\n",
|
|
|
"### Multimodal: Understanding Multiple Images\n",
|
|
|
"\n",
|
|
|
"Llama 4 Scout can process information from multiple images - the number of images you can pass in a single request is only limited by the available memory. To prevent OOM errors, try downsizing the images before passing it to the model. "
|
|
@@ -1350,7 +1343,6 @@
|
|
|
"id": "b472898e-9ffa-429e-b64e-d31c0ebdd3a6",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "<a id='fc'></a>\n",
|
|
|
"## Function Calling with Image Understanding\n",
|
|
|
"\n",
|
|
|
"Function calling now works natively with images, i.e. the model can understand the images and return the appropriate function-call. In this example, we ask Llama to book us tickets to the place shown in the photos."
|