{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Running Llama2 on Mac\n",
    "This notebook goes over how you can set up and run Llama2 locally on a Mac using llama-cpp-python and the llama-cpp's quantized Llama2 model. It also goes over how to use LangChain to ask Llama general questions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Steps at a glance:\n",
    "1. Use CMAKE and install required packages\n",
    "2. Request download of model weights from the Llama website\n",
    "3. Clone the llama repo and get the weights\n",
    "4. Clone the llamacpp repo and quantize the model\n",
    "5. Prepare the script\n",
    "6. Run the example\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "
\n",
    "\n",
    "#### 1. Use CMAKE and install required packages\n",
    "\n",
    "Type the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1: sets the appropriate build configuration options for the llama-cpp-python package \n",
    "#and enables the use of Metal in Mac and forces the use of CMake as the build system.\n",
    "!CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 pip install llama-cpp-python\n",
    "\n",
    "#pip install llama-cpp-python: installs the llama-cpp-python package and its dependencies:\n",
    "!pip install pypdf sentence-transformers chromadb langchain"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If running without a Jupyter notebook, use the command without the `!`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A brief look at the installed libraries:\n",
    "- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) a simple Python bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) library\n",
    "- pypdf gives us the ability to work with pdfs\n",
    "- sentence-transformers for text embeddings\n",
    "- chromadb gives us database capabilities \n",
    "- langchain provides necessary RAG tools for this demo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "
\n",
    "\n",
    "#### 2. Request download of model weights from the Llama website\n",
    "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
    "Fill  the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "
\n",
    "\n",
    "#### 3. Clone the llama repo and get the weights\n",
    "Git clone the [Llama repo](https://github.com/facebookresearch/llama.git). Enter the URL and get 13B weights. This example demonstrates a llama2 model with 13B parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "
\n",
    "\n",
    "#### 4. Clone the llamacpp repo and quantize the model\n",
    "* Git clone the [Llamacpp repo](https://github.com/ggerganov/llama.cpp). \n",
    "* Enter the repo:\n",
    "`cd llama.cpp`\n",
    "* Install requirements:\n",
    "`python3 -m pip install -r requirements.txt`\n",
    "* Convert the weights:\n",
    "`python convert.py `\n",
    "* Run make to generate the 'quantize' method that we will use in the next step\n",
    "`make`\n",
    "* Quantize the weights:\n",
    "`./quantize /ggml-model-f16.gguf /ggml-model-q4_0.gguf q4_0`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### 5. Prepare the script\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# mentions the instance of the Llama model that we will use\n",
    "from langchain.llms import LlamaCpp\n",
    "\n",
    "# defines a chain of operations that can be performed on text input to generate the output using the LLM\n",
    "from langchain.chains import LLMChain\n",
    "\n",
    "# manages callbacks that are triggered at various stages during the execution of an LLMChain\n",
    "from langchain.callbacks.manager import CallbackManager\n",
    "\n",
    "# defines a callback that streams the output of the LLMChain to the console in real-time as it gets generated\n",
    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
    "\n",
    "# allows to define prompt templates that can be used to generate custom inputs for the LLM\n",
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "\n",
    "# Initialize the langchain CallBackManager. This handles callbacks from Langchain and for this example we will use \n",
    "# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question\n",
    "callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
    "\n",
    "# Set up the model\n",
    "llm = LlamaCpp(\n",
    "    model_path=\"\",\n",
    "    temperature=0.0,\n",
    "    top_p=1,\n",
    "    n_ctx=6000,\n",
    "    callback_manager=callback_manager, \n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6. Run the example\n",
    "\n",
    "With the model set up, you are now ready to ask some questions. \n",
    "\n",
    "Here is an example of the simplest way to ask the model some general questions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the example\n",
    "question = \"who wrote the book Pride and Prejudice?\"\n",
    "answer = llm(question)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you can use LangChain's `PromptTemplate` for some flexibility in your prompts and questions. For more information on LangChain's prompt template visit this [link](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = PromptTemplate.from_template(\n",
    "    \"who wrote {book}?\"\n",
    ")\n",
    "chain = LLMChain(llm=llm, prompt=prompt)\n",
    "answer = chain.run(\"A tale of two cities\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}