Przeglądaj źródła

Ref from infernce recipes to vllm for 405B

Matthias Reso 9 miesięcy temu
rodzic
commit
a3fd369127

+ 3 - 0
recipes/quickstart/inference/local_inference/README.md

@@ -84,3 +84,6 @@ Then run inference using:
 python inference.py --model_name <training_config.output_dir> --prompt_file <test_prompt_file>
 
 ```
+
+## Inference on large modles like Meta Llama 405B
+To run the Meta Llama 405B variant without quantization we need to ue a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).