1 год назад · a3fd369127
--- a/recipes/quickstart/inference/local_inference/README.md
+++ b/recipes/quickstart/inference/local_inference/README.md
@@ -84,3 +84,6 @@ Then run inference using:
 
																 python inference.py --model_name <training_config.output_dir> --prompt_file <test_prompt_file>
															
 
																 ```
															
 
																+
															
 
																+## Inference on large modles like Meta Llama 405B
															
 
																+To run the Meta Llama 405B variant without quantization we need to ue a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).