|
@@ -7,11 +7,11 @@
|
|
|
"source": [
|
|
|
"<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/Prompt_Engineering_with_Llama_3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
|
|
"\n",
|
|
|
- "# Prompt Engineering with Llama 3\n",
|
|
|
+ "# Prompt Engineering with Llama 3.1\n",
|
|
|
"\n",
|
|
|
"Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n",
|
|
|
"\n",
|
|
|
- "This interactive guide covers prompt engineering & best practices with Llama 3."
|
|
|
+ "This interactive guide covers prompt engineering & best practices with Llama 3.1."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -45,6 +45,15 @@
|
|
|
"\n",
|
|
|
"Llama models come in varying parameter sizes. The smaller models are cheaper to deploy and run; the larger models are more capable.\n",
|
|
|
"\n",
|
|
|
+ "#### Llama 3.1\n",
|
|
|
+ "1. `llama-3.1-8b` - base pretrained 8 billion parameter model\n",
|
|
|
+ "1. `llama-3.1-70b` - base pretrained 70 billion parameter model\n",
|
|
|
+ "1. `llama-3.1-405b` - base pretrained 405 billion parameter model\n",
|
|
|
+ "1. `llama-3.1-8b-instruct` - instruction fine-tuned 8 billion parameter model\n",
|
|
|
+ "1. `llama-3.1-70b-instruct` - instruction fine-tuned 70 billion parameter model\n",
|
|
|
+ "1. `llama-3.1-405b-instruct` - instruction fine-tuned 405 billion parameter model (flagship)\n",
|
|
|
+ "\n",
|
|
|
+ "\n",
|
|
|
"#### Llama 3\n",
|
|
|
"1. `llama-3-8b` - base pretrained 8 billion parameter model\n",
|
|
|
"1. `llama-3-70b` - base pretrained 70 billion parameter model\n",
|
|
@@ -133,7 +142,7 @@
|
|
|
"\n",
|
|
|
"Tokens matter most when you consider API pricing and internal behavior (ex. hyperparameters).\n",
|
|
|
"\n",
|
|
|
- "Each model has a maximum context length that your prompt cannot exceed. That's 8K tokens for Llama 3, 4K for Llama 2, and 100K for Code Llama. \n"
|
|
|
+ "Each model has a maximum context length that your prompt cannot exceed. That's 128k tokens for Llama 3.1, 4K for Llama 2, and 100K for Code Llama.\n"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -143,7 +152,7 @@
|
|
|
"source": [
|
|
|
"## Notebook Setup\n",
|
|
|
"\n",
|
|
|
- "The following APIs will be used to call LLMs throughout the guide. As an example, we'll call Llama 3 chat using [Grok](https://console.groq.com/playground?model=llama3-70b-8192).\n",
|
|
|
+ "The following APIs will be used to call LLMs throughout the guide. As an example, we'll call Llama 3.1 chat using [Grok](https://console.groq.com/playground?model=llama3-70b-8192).\n",
|
|
|
"\n",
|
|
|
"To install prerequisites run:"
|
|
|
]
|
|
@@ -171,8 +180,9 @@
|
|
|
"# Get a free API key from https://console.groq.com/keys\n",
|
|
|
"os.environ[\"GROQ_API_KEY\"] = \"YOUR_GROQ_API_KEY\"\n",
|
|
|
"\n",
|
|
|
- "LLAMA3_70B_INSTRUCT = \"llama3-70b-8192\"\n",
|
|
|
- "LLAMA3_8B_INSTRUCT = \"llama3-8b-8192\"\n",
|
|
|
+ "LLAMA3_405B_INSTRUCT = \"llama-3.1-405b-reasoning\" # Note: Groq currently only gives access here to paying customers for 405B model\n",
|
|
|
+ "LLAMA3_70B_INSTRUCT = \"llama-3.1-70b-versatile\"\n",
|
|
|
+ "LLAMA3_8B_INSTRUCT = \"llama3.1-8b-instant\"\n",
|
|
|
"\n",
|
|
|
"DEFAULT_MODEL = LLAMA3_70B_INSTRUCT\n",
|
|
|
"\n",
|
|
@@ -225,7 +235,7 @@
|
|
|
"source": [
|
|
|
"### Completion APIs\n",
|
|
|
"\n",
|
|
|
- "Let's try Llama 3!"
|
|
|
+ "Let's try Llama 3.1!"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -488,7 +498,7 @@
|
|
|
"\n",
|
|
|
"Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting.\n",
|
|
|
"\n",
|
|
|
- "Llama 3 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness."
|
|
|
+ "Llama 3.1 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -704,7 +714,7 @@
|
|
|
"source": [
|
|
|
"### Limiting Extraneous Tokens\n",
|
|
|
"\n",
|
|
|
- "A common struggle with Llama 2 is getting output without extraneous tokens (ex. \"Sure! Here's more information on...\"), even if explicit instructions are given to Llama 2 to be concise and no preamble. Llama 3 can better follow instructions.\n",
|
|
|
+ "A common struggle with Llama 2 is getting output without extraneous tokens (ex. \"Sure! Here's more information on...\"), even if explicit instructions are given to Llama 2 to be concise and no preamble. Llama 3.x can better follow instructions.\n",
|
|
|
"\n",
|
|
|
"Check out this improvement that combines a role, rules and restrictions, explicit instructions, and an example:"
|
|
|
]
|