1 год назад · a3fd369127
--- a/recipes/quickstart/inference/local_inference/README.md
+++ b/recipes/quickstart/inference/local_inference/README.md
@@ -84,3 +84,6 @@ Then run inference using:
 
				 python inference.py --model_name <training_config.output_dir> --prompt_file <test_prompt_file>
			
 
				 
			
 
				 ```
			
 
				+
			
 
				+## Inference on large modles like Meta Llama 405B
			
 
				+To run the Meta Llama 405B variant without quantization we need to ue a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).