1 년 전 · b273a75a97
--- a/recipes/inference/local_inference/README.md
+++ b/recipes/inference/local_inference/README.md
@@ -61,7 +61,7 @@ python inference.py --model_name <training_config.output_dir> --peft_model <trai
 
				 
			
 
				 ```
			
 
				 
			
 
				-## Loading back FSDP checkpoints
			
 
				+## Inference with FSDP checkpoints
			
 
				 
			
 
				 In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
			
 
				 **To convert the checkpoint use the following command**:
			
@@ -82,6 +82,6 @@ By default, training parameter are saved in `train_params.yaml` in the path wher
 
				 Then run inference using:
			
 
				 
			
 
				 ```bash
			
 
				-python inference.py --model_name <training_config.output_dir> --prompt_file <test_prompt_file> 
			
 
				+python inference.py --model_name <training_config.output_dir> --prompt_file <test_prompt_file>
			
 
				 
			
 
				-```
			
 
				+```
			
--- a/recipes/quickstart/README.md
+++ b/recipes/quickstart/README.md
@@ -0,0 +1,28 @@
 
				+## Llama-Recipes Quickstart
			
 
				+
			
 
				+If you are new to developing with Meta Llama models, this is where you should start. This folder contains introductory-level notebooks across different techniques relating to Meta Llama.
			
 
				+
			
 
				+* The [](./Running_Llama3_Anywhere/) notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling.
			
 
				+* The [](./prompt_engineering/Prompt_Engineering_with_Llama_3.ipynb) notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
			
 
				+* The [](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [](../3p_integrations/vllm/) and [](../3p_integrations/tgi/) for hosting Llama on open-source model servers.
			
 
				+* The [](./RAG/) folder contains a simple Retrieval-Augmented Generation application using Llama 3.
			
 
				+* The [](./finetuning/) folder contains resources to help you finetune Llama 3 on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [](../../src/llama_recipes/finetuning.py) which supports these features:
			
 
				+
			
 
				+| Feature                                        |   |
			
 
				+| ---------------------------------------------- | - |
			
 
				+| HF support for finetuning                      | ✅ |
			
 
				+| Deferred initialization ( meta init)           | ✅ |
			
 
				+| HF support for inference                       | ✅ |
			
 
				+| Low CPU mode for multi GPU                     | ✅ |
			
 
				+| Mixed precision                                | ✅ |
			
 
				+| Single node quantization                       | ✅ |
			
 
				+| Flash attention                                | ✅ |
			
 
				+| PEFT                                           | ✅ |
			
 
				+| Activation checkpointing FSDP                  | ✅ |
			
 
				+| Hybrid Sharded Data Parallel (HSDP)            | ✅ |
			
 
				+| Dataset packing & padding                      | ✅ |
			
 
				+| BF16 Optimizer ( Pure BF16)                    | ✅ |
			
 
				+| Gradient accumulation                          | ✅ |
			
 
				+| CPU offloading                                 | ✅ |
			
 
				+| FSDP checkpoint conversion to HF for inference | ✅ |
			
 
				+| W&B experiment tracker                         | ✅ |
			
--- a/recipes/quickstart/inference/README.md
+++ b/recipes/quickstart/inference/README.md
@@ -0,0 +1,7 @@
 
				+## Quickstart > Inference
			
 
				+
			
 
				+This folder contains scripts to get you started with inference on Meta Llama models.
			
 
				+
			
 
				+* [](./code_llama/) contains scripts for tasks relating to code generation using CodeLlama
			
 
				+* [](./local_inference/) contsin scripts to do memory efficient inference on servers and local machines
			
 
				+* [](./mobile_inference/) has scripts using MLC to serve Llama on Android (h/t to OctoAI for the contribution!)
			
--- a/recipes/quickstart/prompt_engineering/Prompt_Engineering_with_Llama_3.ipynb
+++ b/recipes/quickstart/prompt_engineering/Prompt_Engineering_with_Llama_3.ipynb