1 anno fa · 8e92023a4c
--- a/getting-started/README.md
+++ b/getting-started/README.md
@@ -6,4 +6,4 @@ If you are new to developing with Meta Llama models, this is where you should st
 
				 * The [Prompt_Engineering_with_Llama](./Prompt_Engineering_with_Llama.ipynb) notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
			
 
				 * The [inference](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [3p_integrations/vllm](../3p-integrations/vllm/) and [3p_integrations/tgi](../3p-integrations/tgi/) for hosting Llama on open-source model servers.
			
 
				 * The [RAG](./RAG/) folder contains a simple Retrieval-Augmented Generation application using Llama.
			
 
				-* The [finetuning](./finetuning/) folder contains resources to help you finetune Llama on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [finetuning.py](../src/llama_recipes/finetuning.py) which supports these features:
			
 
				+* The [finetuning](./finetuning/) folder contains resources to help you finetune Llama on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-cookbook finetuning code found in [finetuning.py](../src/llama_cookbook/finetuning.py) which supports these features:
			
--- a/getting-started/finetuning/datasets/README.md
+++ b/getting-started/finetuning/datasets/README.md
@@ -56,9 +56,9 @@ For CausalLM models this usually means that the data needs to be in the form of
 
				 
			
 
				 To add a custom dataset the following steps need to be performed.
			
 
				 
			
 
				-1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../../../src/llama_recipes/configs/datasets.py).
			
 
				+1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../../../src/llama_cookbook/configs/datasets.py).
			
 
				 2. Create a preprocessing routine which loads the data and returns a PyTorch style dataset. The signature for the preprocessing function needs to be (dataset_config, tokenizer, split_name) where split_name will be the string for train/validation split as defined in the dataclass.
			
 
				-3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [datasets/__init__.py](../../../src/llama_recipes/datasets/__init__.py)
			
 
				+3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [datasets/__init__.py](../../../src/llama_cookbook/datasets/__init__.py)
			
 
				 4. Set dataset field in training config to dataset name or use --dataset option of the `llama_cookbook.finetuning` module or examples/finetuning.py training script.
			
 
				 
			
 
				 ## Application
			
--- a/getting-started/inference/local_inference/README.md
+++ b/getting-started/inference/local_inference/README.md
@@ -105,7 +105,7 @@ python inference.py --model_name <training_config.output_dir> --peft_model <trai
 
				 
			
 
				 ## Inference with FSDP checkpoints
			
 
				 
			
 
				-In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
			
 
				+In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../src/llama_cookbook/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
			
 
				 **To convert the checkpoint use the following command**:
			
 
				 
			
 
				 This is helpful if you have fine-tuned you model using FSDP only as follows:
			
@@ -115,7 +115,7 @@ torchrun --nnodes 1 --nproc_per_node 8  recipes/quickstart/finetuning/finetuning
 
				 ```
			
 
				 Then convert your FSDP checkpoint to HuggingFace checkpoints using:
			
 
				 ```bash
			
 
				- python -m llama_recipes.inference.checkpoint_converter_fsdp_hf --fsdp_checkpoint_path  PATH/to/FSDP/Checkpoints --consolidated_model_path PATH/to/save/checkpoints --HF_model_path_or_name PATH/or/HF/model_name
			
 
				+ python -m llama_cookbook.inference.checkpoint_converter_fsdp_hf --fsdp_checkpoint_path  PATH/to/FSDP/Checkpoints --consolidated_model_path PATH/to/save/checkpoints --HF_model_path_or_name PATH/or/HF/model_name
			
 
				 
			
 
				  # --HF_model_path_or_name specifies the HF Llama model name or path where it has config.json and tokenizer.json
			
 
				  ```
			
@@ -130,4 +130,4 @@ python inference.py --model_name <training_config.output_dir> --prompt_file <tes
 
				 
			
 
				 ## Inference on large models like Meta Llama 405B
			
 
				 The FP8 quantized variants of Meta Llama (i.e. meta-llama/Meta-Llama-3.1-405B-FP8 and meta-llama/Meta-Llama-3.1-405B-Instruct-FP8) can be executed on a single node with 8x80GB H100 using the scripts located in this folder.
			
 
				-To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p-integrations/vllm/README.md).
			
 
				+To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-cookbook inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p-integrations/vllm/README.md).