radu/LLamaRecipes @ a1d51de4b0e6da317e00de0848a9b234f77ba36b

Beto 4f5fb7c8e9 Merge branch 'main' of github.com:meta-llama/llama-recipes into inference_changes		1 year ago
..
RAG	4344a420f2 recipes/quickstart folder updated	2 years ago
Running_Llama3_Anywhere	4be3eb0d17 Updates HF model_ids and readmes for 3.1	1 year ago
agents	cc569ef52b colab links fixed	2 years ago
finetuning	4f5fb7c8e9 Merge branch 'main' of github.com:meta-llama/llama-recipes into inference_changes	1 year ago
inference	00e0b0be6c Apply suggestions from code review	1 year ago
Getting_to_know_Llama.ipynb	b1939b10c9 replace groq llama 2 with replicate	2 years ago
Prompt_Engineering_with_Llama_3.ipynb	c12aab7030 Moving Prompt eng file to quickstart	2 years ago
README.md	aa3043d416 Update links in README.md	2 years ago

Llama-Recipes Quickstart

If you are new to developing with Meta Llama models, this is where you should start. This folder contains introductory-level notebooks across different techniques relating to Meta Llama.

The Running_Llama3_Anywhere notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling.
The Prompt_Engineering_with_Llama_3 notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
The inference folder contains scripts to deploy Llama for inference on server and mobile. See also 3p_integrations/vllm and 3p_integrations/tgi for hosting Llama on open-source model servers.
The RAG folder contains a simple Retrieval-Augmented Generation application using Llama 3.
The finetuning folder contains resources to help you finetune Llama 3 on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in finetuning.py which supports these features:

| Feature | | | ---------------------------------------------- | - | | HF support for finetuning | ✅ | | Deferred initialization ( meta init) | ✅ | | HF support for inference | ✅ | | Low CPU mode for multi GPU | ✅ | | Mixed precision | ✅ | | Single node quantization | ✅ | | Flash attention | ✅ | | PEFT | ✅ | | Activation checkpointing FSDP | ✅ | | Hybrid Sharded Data Parallel (HSDP) | ✅ | | Dataset packing & padding | ✅ | | BF16 Optimizer ( Pure BF16) | ✅ | | Profiling & MFU tracking | ✅ | | Gradient accumulation | ✅ | | CPU offloading | ✅ | | FSDP checkpoint conversion to HF for inference | ✅ | | W&B experiment tracker | ✅ |

README.md

Llama-Recipes Quickstart