sekyondaMeta fbffb0e430 Merge branch 'facebookresearch:main' into inferenceDocUpdate %!s(int64=2) %!d(string=hai) anos
..
hf-text-generation-inference 4767f09ecd Initial commit %!s(int64=2) %!d(string=hai) anos
README.md f70fceb8c7 Moved inference.md to docs %!s(int64=2) %!d(string=hai) anos
chat_completion.py 557e881fcc aliginng the pad token with HF latest %!s(int64=2) %!d(string=hai) anos
chat_utils.py 4767f09ecd Initial commit %!s(int64=2) %!d(string=hai) anos
chats.json 4767f09ecd Initial commit %!s(int64=2) %!d(string=hai) anos
inference.py 7ec390bfc8 aliging special tokens in toeknizer with HF latest %!s(int64=2) %!d(string=hai) anos
model_utils.py 4767f09ecd Initial commit %!s(int64=2) %!d(string=hai) anos
safety_utils.py 4767f09ecd Initial commit %!s(int64=2) %!d(string=hai) anos
samsum_prompt.txt 4767f09ecd Initial commit %!s(int64=2) %!d(string=hai) anos
vLLM_inference.py 4767f09ecd Initial commit %!s(int64=2) %!d(string=hai) anos

README.md

Inference

For inference we have provided an inference script. Depending on the type of finetuning performed during training the inference script takes different arguments. To finetune all model parameters the output dir of the training has to be given as --model_name argument. In the case of a parameter efficient method like lora the base model has to be given as --model_name and the output dir of the training has to be given as --peft_model argument. Additionally, a prompt for the model in the form of a text file has to be provided. The prompt file can either be piped through standard input or given as --prompt_file parameter.

For other inference options, you can use the vLLM_inference.py script for vLLM or review the hf-text-generation-inference folder for TGI.

For more information including inference safety checks, examples and other inference options available to you, see the inference documentation here.