1 год назад · b8ba3e761d
--- a/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_HF_transformers.ipynb
+++ b/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_HF_transformers.ipynb
@@ -4,8 +4,8 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Running Meta Llama 3 on Google Colab using Hugging Face transformers library\n",
			
 
				-    "This notebook goes over how you can set up and run Llama 3 using Hugging Face transformers library\n",
			
 
				+    "## Running Meta Llama 3.1 on Google Colab using Hugging Face transformers library\n",
			
 
				+    "This notebook goes over how you can set up and run Llama 3.1 using Hugging Face transformers library\n",
			
 
				     "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
			
 
				    ]
			
 
				   },
			
@@ -14,7 +14,7 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "### Steps at a glance:\n",
			
 
				-    "This demo showcases how to run the example with already converted Llama 3 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n",
			
 
				+    "This demo showcases how to run the example with already converted Llama 3.1 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n",
			
 
				     "\n",
			
 
				     "To use already converted weights, start here:\n",
			
 
				     "1. Request download of model weights from the Llama website\n",
			
@@ -45,7 +45,7 @@
 
				     "Request download of model weights from the Llama website\n",
			
 
				     "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
			
 
				     "\n",
			
 
				-    "Fill  the required information, select the models “Meta Llama 3” and accept the terms & conditions. You will receive a URL in your email in a short time."
			
 
				+    "Fill  the required information, select the models “Meta Llama 3.1” and accept the terms & conditions. You will receive a URL in your email in a short time."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -94,7 +94,7 @@
 
				    "source": [
			
 
				     "Then, we will set the model variable to a specific model we’d like to use. In this demo, we will use the 8b chat model `meta-llama/Meta-Llama-3.1-8B-Instruct`. Using Meta models from Hugging Face requires you to\n",
			
 
				     "\n",
			
 
				-    "1. Accept Terms of Service for Meta Llama 3 on Meta [website](https://llama.meta.com/llama-downloads).\n",
			
 
				+    "1. Accept Terms of Service for Meta Llama 3.1 on Meta [website](https://llama.meta.com/llama-downloads).\n",
			
 
				     "2. Use the same email address from Step (1) to login into Hugging Face.\n",
			
 
				     "\n",
			
 
				     "Follow the instructions on this Hugging Face page to login from your [terminal](https://huggingface.co/docs/huggingface_hub/en/quick-start). "
			
@@ -208,7 +208,7 @@
 
				     "#### 2. Clone the llama repo and get the weights\n",
			
 
				     "Git clone the [Meta Llama 3 repo](https://github.com/meta-llama/llama3). Run the `download.sh` script and follow the instructions. This will download the model checkpoints and tokenizer.\n",
			
 
				     "\n",
			
 
				-    "This example demonstrates a Meta Llama 3 model with 8B-instruct parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models."
			
 
				+    "This example demonstrates a Meta Llama 3.1 model with 8B-instruct parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -223,7 +223,7 @@
 
				     "* `cd transformers`\n",
			
 
				     "* `pip install -e .`\n",
			
 
				     "* `pip install torch tiktoken blobfile accelerate`\n",
			
 
				-    "* `python3 src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir ${path_to_meta_downloaded_model} --output_dir ${path_to_save_converted_hf_model} --model_size 8B --llama_version 3`"
			
 
				+    "* `python3 src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir ${path_to_meta_downloaded_model} --output_dir ${path_to_save_converted_hf_model} --model_size 8B --llama_version 3.1`"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -233,7 +233,7 @@
 
				     "\n",
			
 
				     "#### 4. Prepare the script\n",
			
 
				     "Import the following necessary modules in your script: \n",
			
 
				-    "* `AutoModel` is the Llama 2 model class\n",
			
 
				+    "* `AutoModel` is the Llama 3 model class\n",
			
 
				     "* `AutoTokenizer` prepares your prompt for the model to process\n",
			
 
				     "* `pipeline` is an abstraction to generate model outputs"
			
 
				    ]
			
--- a/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb
+++ b/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb
@@ -5,7 +5,7 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "## Running Llama 3 on Mac, Windows or Linux\n",
			
 
				-    "This notebook goes over how you can set up and run Llama 3 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/)."
			
 
				+    "This notebook goes over how you can set up and run Llama 3.1 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/)."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -14,9 +14,9 @@
 
				    "source": [
			
 
				     "### Steps at a glance:\n",
			
 
				     "1. Download and install Ollama.\n",
			
 
				-    "2. Download and test run Llama 3.\n",
			
 
				-    "3. Use local Llama 3 via Python.\n",
			
 
				-    "4. Use local Llama 3 via LangChain.\n"
			
 
				+    "2. Download and test run Llama 3.1\n",
			
 
				+    "3. Use local Llama 3.1 via Python.\n",
			
 
				+    "4. Use local Llama 3.1 via LangChain.\n"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -36,16 +36,16 @@
 
				    "source": [
			
 
				     "#### 2. Download and test run Llama 3\n",
			
 
				     "\n",
			
 
				-    "On a terminal or console, run `ollama pull llama3` to download the Llama 3 8b chat model, in the 4-bit quantized format with size about 4.7 GB.\n",
			
 
				+    "On a terminal or console, run `ollama pull llama3.1` to download the Llama 3.1 8b chat model, in the 4-bit quantized format with size about 4.7 GB.\n",
			
 
				     "\n",
			
 
				-    "Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
			
 
				+    "Run `ollama pull llama3.1:70b` to download the Llama 3.1 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
			
 
				     "\n",
			
 
				-    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3 70b chat (vs over 10 tokens per second with Llama 3 8b chat).\n",
			
 
				+    "Then you can run `ollama run llama3.1` and ask Llama 3.1 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3.1:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3.1 70b chat (vs over 10 tokens per second with Llama 3.1 8b chat).\n",
			
 
				     "\n",
			
 
				-    "You can also run the following command to test Llama 3 8b chat:\n",
			
 
				+    "You can also run the following command to test Llama 3.1 8b chat:\n",
			
 
				     "```\n",
			
 
				     " curl http://localhost:11434/api/chat -d '{\n",
			
 
				-    "  \"model\": \"llama3\",\n",
			
 
				+    "  \"model\": \"llama3.1\",\n",
			
 
				     "  \"messages\": [\n",
			
 
				     "    {\n",
			
 
				     "      \"role\": \"user\",\n",
			
@@ -63,7 +63,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "#### 3. Use local Llama 3 via Python\n",
			
 
				+    "#### 3. Use local Llama 3.1 via Python\n",
			
 
				     "\n",
			
 
				     "The Python code below is the port of the curl command above."
			
 
				    ]
			
@@ -114,7 +114,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "#### 4. Use local Llama 3 via LangChain\n",
			
 
				+    "#### 4. Use local Llama 3.1 via LangChain\n",
			
 
				     "\n",
			
 
				     "Code below use LangChain with Ollama to query Llama 3 running locally. For a more advanced example of using local Llama 3 with LangChain and agent-powered RAG, see [this](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb)."
			
 
				    ]
			
@@ -136,7 +136,7 @@
 
				    "source": [
			
 
				     "from langchain_community.chat_models import ChatOllama\n",
			
 
				     "\n",
			
 
				-    "llm = ChatOllama(model=\"llama3\", temperature=0)\n",
			
 
				+    "llm = ChatOllama(model=\"llama3.1\", temperature=0)\n",
			
 
				     "response = llm.invoke(\"who wrote the book godfather?\")\n",
			
 
				     "print(response.content)\n"
			
 
				    ]