|
@@ -5,7 +5,7 @@
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
"## Running Llama 3 on Mac, Windows or Linux\n",
|
|
|
- "This notebook goes over how you can set up and run Llama 3 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/)."
|
|
|
+ "This notebook goes over how you can set up and run Llama 3.1 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/)."
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -14,9 +14,9 @@
|
|
|
"source": [
|
|
|
"### Steps at a glance:\n",
|
|
|
"1. Download and install Ollama.\n",
|
|
|
- "2. Download and test run Llama 3.\n",
|
|
|
- "3. Use local Llama 3 via Python.\n",
|
|
|
- "4. Use local Llama 3 via LangChain.\n"
|
|
|
+ "2. Download and test run Llama 3.1\n",
|
|
|
+ "3. Use local Llama 3.1 via Python.\n",
|
|
|
+ "4. Use local Llama 3.1 via LangChain.\n"
|
|
|
]
|
|
|
},
|
|
|
{
|
|
@@ -36,16 +36,16 @@
|
|
|
"source": [
|
|
|
"#### 2. Download and test run Llama 3\n",
|
|
|
"\n",
|
|
|
- "On a terminal or console, run `ollama pull llama3` to download the Llama 3 8b chat model, in the 4-bit quantized format with size about 4.7 GB.\n",
|
|
|
+ "On a terminal or console, run `ollama pull llama3.1` to download the Llama 3.1 8b chat model, in the 4-bit quantized format with size about 4.7 GB.\n",
|
|
|
"\n",
|
|
|
- "Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
|
|
|
+ "Run `ollama pull llama3.1:70b` to download the Llama 3.1 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
|
|
|
"\n",
|
|
|
- "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3 70b chat (vs over 10 tokens per second with Llama 3 8b chat).\n",
|
|
|
+ "Then you can run `ollama run llama3.1` and ask Llama 3.1 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3.1:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3.1 70b chat (vs over 10 tokens per second with Llama 3.1 8b chat).\n",
|
|
|
"\n",
|
|
|
- "You can also run the following command to test Llama 3 8b chat:\n",
|
|
|
+ "You can also run the following command to test Llama 3.1 8b chat:\n",
|
|
|
"```\n",
|
|
|
" curl http://localhost:11434/api/chat -d '{\n",
|
|
|
- " \"model\": \"llama3\",\n",
|
|
|
+ " \"model\": \"llama3.1\",\n",
|
|
|
" \"messages\": [\n",
|
|
|
" {\n",
|
|
|
" \"role\": \"user\",\n",
|
|
@@ -63,7 +63,7 @@
|
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "#### 3. Use local Llama 3 via Python\n",
|
|
|
+ "#### 3. Use local Llama 3.1 via Python\n",
|
|
|
"\n",
|
|
|
"The Python code below is the port of the curl command above."
|
|
|
]
|
|
@@ -114,7 +114,7 @@
|
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
|
"source": [
|
|
|
- "#### 4. Use local Llama 3 via LangChain\n",
|
|
|
+ "#### 4. Use local Llama 3.1 via LangChain\n",
|
|
|
"\n",
|
|
|
"Code below use LangChain with Ollama to query Llama 3 running locally. For a more advanced example of using local Llama 3 with LangChain and agent-powered RAG, see [this](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb)."
|
|
|
]
|
|
@@ -136,7 +136,7 @@
|
|
|
"source": [
|
|
|
"from langchain_community.chat_models import ChatOllama\n",
|
|
|
"\n",
|
|
|
- "llm = ChatOllama(model=\"llama3\", temperature=0)\n",
|
|
|
+ "llm = ChatOllama(model=\"llama3.1\", temperature=0)\n",
|
|
|
"response = llm.invoke(\"who wrote the book godfather?\")\n",
|
|
|
"print(response.content)\n"
|
|
|
]
|