|  | %!s(int64=2) %!d(string=hai) anos | |
|---|---|---|
| .. | ||
| BreakingNews.ipynb | %!s(int64=2) %!d(string=hai) anos | |
| HelloLlamaCloud.ipynb | %!s(int64=2) %!d(string=hai) anos | |
| HelloLlamaLocal.ipynb | %!s(int64=2) %!d(string=hai) anos | |
| Llama2_Gradio.ipynb | %!s(int64=2) %!d(string=hai) anos | |
| README.md | %!s(int64=2) %!d(string=hai) anos | |
| StructuredLlama.ipynb | %!s(int64=2) %!d(string=hai) anos | |
| VideoSummary.ipynb | %!s(int64=2) %!d(string=hai) anos | |
| csv2db.py | %!s(int64=2) %!d(string=hai) anos | |
| llama2-gradio.png | %!s(int64=2) %!d(string=hai) anos | |
| llama2-streamlit.png | %!s(int64=2) %!d(string=hai) anos | |
| llama2-streamlit2.png | %!s(int64=2) %!d(string=hai) anos | |
| llama2.pdf | %!s(int64=2) %!d(string=hai) anos | |
| nba.txt | %!s(int64=2) %!d(string=hai) anos | |
| streamlit_llama2.py | %!s(int64=2) %!d(string=hai) anos | |
| txt2csv.py | %!s(int64=2) %!d(string=hai) anos | |
This folder showcases the Llama2-powered apps. If you need a general understanding of GenAI, Llama2, prompt engineering and RAG, be sure to first check the Getting to know Llama 2 notebook and its Meta Connect video here.
Here we start with three quickstart demos showing how to run Llama2 locally on a Mac, remotely in the cloud, and on a Google Colab to ask Llama2 general questions or questions about unstructured data not trained for the model.
We then show three demos that ask Llama2 to summarize a YouTube video, to answer questions about structured data stored in a database, and to answer questions about live search results.
More advanced Llama2 demo apps will be coming soon.
The first three demo apps show:
To run Llama2 locally on Mac using llama-cpp-python, first open the notebook HelloLlamaLocal. Then replace <path-to-ggml-model-q4_0.gguf> in the notebook HelloLlamaLocal with the path either to your downloaded quantized model file here, or to the ggml-model-q4_0.gguf file built with the following commands:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
python3 -m pip install -r requirements.txt
python convert.py <path_to_your_downloaded_llama-2-13b_model>
./quantize <path_to_your_downloaded_llama-2-13b_model>/ggml-model-f16.gguf <path_to_your_downloaded_llama-2-13b_model>/ggml-model-q4_0.gguf q4_0
The HelloLlama cloud version uses LangChain with Llama2 hosted in the cloud on Replicate. The demo shows how to ask Llama general questions and follow up questions, and how to use LangChain to ask Llama2 questions about unstructured data stored in a PDF.
Note on using Replicate To run some of the demo apps here, you'll need to first sign in with Replicate with your github account, then create a free API token here that you can use for a while. After the free trial ends, you'll need to enter billing info to continue to use Llama2 hosted on Replicate - according to Replicate's Run time and cost for the Llama2-13b-chat model used in our demo apps, the model "costs $0.000725 per second. Predictions typically complete within 10 seconds." This means each call to the Llama2-13b-chat model costs less than $0.01 if the call completes within 10 seconds. If you want absolutely no costs, you can refer to the section "Running Llama2 locally on Mac" above or the "Running Llama2 in Google Colab" below.
To run Llama2 in Google Colab using llama-cpp-python, download the quantized Llama2-13b-chat model ggml-model-q4_0.gguf here, or follow the instructions above to build it, before uploading it to your Google drive. Note that on the free Colab T4 GPU, the call to Llama could take more than 20 minnutes to return; running the notebook locally on M1 MBP takes about 20 seconds.
* To run a quantized Llama2 model on iOS and Android, you can use the open source MLC LLM or llama.cpp. You can even make a Linux OS that boots to Llama2 (repo).
This demo app uses Llama2 to return a text summary of a YouTube video. It shows how to retrieve the caption of a YouTube video and how to ask Llama to summarize the content in four different ways, from the simplest naive way that works for short text to more advanced methods of using LangChain's map_reduce and refine to overcome the 4096 limit of Llama's max input token size.
This demo app shows how to use LangChain and Llama2 to let users ask questions about structured data stored in a SQL DB. As the 2023-24 NBA season is around the corner, we use the NBA roster info saved in a SQLite DB to show you how to ask Llama2 questions about your favorite teams or players. To save the info in the nba.txt file, created by scraping the NBA roster info from the web, to SQLite, run the commands below to generate nba_roster.db used in the notebook:
python txt2csv.py
python csv2db.py
This demo app shows how to perform live data augmented generation tasks with Llama2 and LlamaIndex, another leading open-source framework for building LLM apps: it uses the You.com serarch API to get breaking news and ask Llama2 about them.
If you prefer to see Llama2 in action in a web UI, instead of the notebooks above, you can try one of the two methods:
Open a Terminal, run the following commands:
pip install streamlit langchain replicate
git clone https://github.com/facebookresearch/llama-recipes
cd llama-recipes/llama-demo-apps
Replace the <your replicate api token> in streamlit_llama2.py with your API token created here - for more info, see the note above.
Then run the command streamlit run streamlit_llama2.py and you'll see on your browser the following UI with question and answer - you can enter new text question, click Submit, and see Llama2's answer:
To see how to query Llama2 and get answers with the Gradio UI both from the notebook and web, just launch the notebook Llama2_Gradio.ipynb, replace the <your replicate api token> with your API token created here - for more info, see the note above.
Then enter your question, click Submit. You'll see in the notebook or a browser with URL http://127.0.0.1:7860 the following UI: