Browse Source

Add README for quickstart + update to codellama url (#578)

Suraj Subramanian 11 months ago
parent
commit
5be3d4a152

+ 29 - 0
recipes/quickstart/README.md

@@ -0,0 +1,29 @@
+## Llama-Recipes Quickstart
+
+If you are new to developing with Meta Llama models, this is where you should start. This folder contains introductory-level notebooks across different techniques relating to Meta Llama.
+
+* The [](./Running_Llama3_Anywhere/) notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling.
+* The [](./Prompt_Engineering_with_Llama_3.ipynb) notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
+* The [](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [](../3p_integration/vllm/) and [](../3p_integration/tgi/) for hosting Llama on open-source model servers.
+* The [](./RAG/) folder contains a simple Retrieval-Augmented Generation application using Llama 3.
+* The [](./finetuning/) folder contains resources to help you finetune Llama 3 on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [](../../src/llama_recipes/finetuning.py) which supports these features:
+
+| Feature                                        |   |
+| ---------------------------------------------- | - |
+| HF support for finetuning                      | ✅ |
+| Deferred initialization ( meta init)           | ✅ |
+| HF support for inference                       | ✅ |
+| Low CPU mode for multi GPU                     | ✅ |
+| Mixed precision                                | ✅ |
+| Single node quantization                       | ✅ |
+| Flash attention                                | ✅ |
+| PEFT                                           | ✅ |
+| Activation checkpointing FSDP                  | ✅ |
+| Hybrid Sharded Data Parallel (HSDP)            | ✅ |
+| Dataset packing & padding                      | ✅ |
+| BF16 Optimizer ( Pure BF16)                    | ✅ |
+| Profiling & MFU tracking                       | ✅ |
+| Gradient accumulation                          | ✅ |
+| CPU offloading                                 | ✅ |
+| FSDP checkpoint conversion to HF for inference | ✅ |
+| W&B experiment tracker                         | ✅ |

+ 7 - 0
recipes/quickstart/inference/README.md

@@ -0,0 +1,7 @@
+## Quickstart > Inference
+
+This folder contains scripts to get you started with inference on Meta Llama models.
+
+* [](./code_llama/) contains scripts for tasks relating to code generation using CodeLlama
+* [](./local_inference/) contsin scripts to do memory efficient inference on servers and local machines
+* [](./mobile_inference/) has scripts using MLC to serve Llama on Android (h/t to OctoAI for the contribution!)

+ 2 - 2
recipes/quickstart/inference/code_llama/README.md

@@ -4,7 +4,7 @@ Code llama was recently released with three flavors, base-model that support mul
 
 
 Find the scripts to run Code Llama, where there are two examples of running code completion and infilling.
 Find the scripts to run Code Llama, where there are two examples of running code completion and infilling.
 
 
-**Note** Please find the right model on HF side [here](https://huggingface.co/codellama). 
+**Note** Please find the right model on HF [here](https://huggingface.co/models?search=meta-llama%20codellama). 
 
 
 Make sure to install Transformers from source for now
 Make sure to install Transformers from source for now
 
 
@@ -36,4 +36,4 @@ To run the 70B Instruct model example run the following (you'll need to enter th
 python code_instruct_example.py --model_name codellama/CodeLlama-70b-Instruct-hf --temperature 0.2 --top_p 0.9
 python code_instruct_example.py --model_name codellama/CodeLlama-70b-Instruct-hf --temperature 0.2 --top_p 0.9
 
 
 ```
 ```
-You can learn more about the chat prompt template [on HF](https://huggingface.co/codellama/CodeLlama-70b-Instruct-hf#chat-prompt) and [original Code Llama repository](https://github.com/facebookresearch/codellama/blob/main/README.md#fine-tuned-instruction-models). HF tokenizer has already taken care of the chat template as shown in this example. 
+You can learn more about the chat prompt template [on HF](https://huggingface.co/meta-llama/CodeLlama-70b-Instruct-hf#chat-prompt) and [original Code Llama repository](https://github.com/meta-llama/codellama/blob/main/README.md#fine-tuned-instruction-models). HF tokenizer has already taken care of the chat template as shown in this example. 

+ 1 - 1
recipes/quickstart/inference/local_inference/README.md

@@ -61,7 +61,7 @@ python inference.py --model_name <training_config.output_dir> --peft_model <trai
 
 
 ```
 ```
 
 
-## Loading back FSDP checkpoints
+## Inference with FSDP checkpoints
 
 
 In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
 In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
 **To convert the checkpoint use the following command**:
 **To convert the checkpoint use the following command**:

recipes/quickstart/Prompt_Engineering_with_Llama_3.ipynb → recipes/quickstart/prompt_engineering/Prompt_Engineering_with_Llama_3.ipynb