Suraj Subramanian 2f1cbfbbbf Merge remote-tracking branch 'upstream/main' into suraj-changes 1 anno fa
..
RAG 4344a420f2 recipes/quickstart folder updated 1 anno fa
Running_Llama3_Anywhere c68410cbad typo fix 1 anno fa
finetuning 181a3fb68e Merge branch 'pia-changes2' of https://github.com/pia-papanna/llama-recipes into pia-changes2 1 anno fa
inference 2f1cbfbbbf Merge remote-tracking branch 'upstream/main' into suraj-changes 1 anno fa
prompt_engineering b273a75a97 * Add new readmes 1 anno fa
Getting_to_know_Llama.ipynb b1939b10c9 replace groq llama 2 with replicate 1 anno fa
README.md 92e661e4d9 Update renamed links 1 anno fa

README.md

Llama-Recipes Quickstart

If you are new to developing with Meta Llama models, this is where you should start. This folder contains introductory-level notebooks across different techniques relating to Meta Llama.

  • The [](./Running_Llama3_Anywhere/) notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling.
  • The [](./Prompt_Engineering_with_Llama_3.ipynb) notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
  • The [](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [](../3p_integration/vllm/) and [](../3p_integration/tgi/) for hosting Llama on open-source model servers.
  • The [](./RAG/) folder contains a simple Retrieval-Augmented Generation application using Llama 3.
  • The [](./finetuning/) folder contains resources to help you finetune Llama 3 on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [](../../src/llama_recipes/finetuning.py) which supports these features:

| Feature | | | ---------------------------------------------- | - | | HF support for finetuning | ✅ | | Deferred initialization ( meta init) | ✅ | | HF support for inference | ✅ | | Low CPU mode for multi GPU | ✅ | | Mixed precision | ✅ | | Single node quantization | ✅ | | Flash attention | ✅ | | PEFT | ✅ | | Activation checkpointing FSDP | ✅ | | Hybrid Sharded Data Parallel (HSDP) | ✅ | | Dataset packing & padding | ✅ | | BF16 Optimizer ( Pure BF16) | ✅ | | Gradient accumulation | ✅ | | CPU offloading | ✅ | | FSDP checkpoint conversion to HF for inference | ✅ | | W&B experiment tracker | ✅ |