Просмотр исходного кода

Ref from infernce recipes to vllm for 405B

Matthias Reso 9 месяцев назад
Родитель
Сommit
a3fd369127
1 измененных файлов с 3 добавлено и 0 удалено
  1. 3 0
      recipes/quickstart/inference/local_inference/README.md

+ 3 - 0
recipes/quickstart/inference/local_inference/README.md

@@ -84,3 +84,6 @@ Then run inference using:
 python inference.py --model_name <training_config.output_dir> --prompt_file <test_prompt_file>
 
 ```
+
+## Inference on large modles like Meta Llama 405B
+To run the Meta Llama 405B variant without quantization we need to ue a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).