radu
/
LLamaRecipes
mirror of https://github.com/facebookresearch/llama-recipes.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
							[
{
"question": "What if I want to access Llama models but I’m not sure if my use is permitted under the Llama 2 Community License?",
"answer": "On a limited case by case basis, we will consider bespoke licensing requests from individual entities. Please contact llamamodels@meta.com to provide more details about your request."
},
{
"question": "Why are you not sharing the training datasets for Llama?",
"answer": "We believe developers will have plenty to work with as we release our model weights and starting code for pre-trained and conversational fine-tuned versions as well as responsible use resources. While data mixes are intentionally withheld for competitive reasons, all models have gone through Meta’s internal Privacy Review process to ensure responsible data usage in building our products. We are dedicated to the responsible and ethical development of our GenAI products, ensuring our policies reflect diverse contexts and meet evolving societal expectations."
},
{
"question": "Did we use human annotators to develop the data for our models?",
"answer": "Yes. There are more details, for example, about our use of human annotators in the Llama 2 research paper."
},
{
"question": "Can I use the output of the models to improve the Llama family of models, even though I cannot use them for other LLMs?",
"answer": "It's correct that the license restricts using any part of the Llama models, including the response outputs to train another AI model (LLM or otherwise). However, one can use the outputs to further train the Llama family of models. Techniques such as Quantized Aware Training (QAT) utilize such a technique and hence this is allowed."
},
{
"question": "What operating systems (OS) are officially supported?",
"answer": "For the core Llama GitHub repos (Llama and Llama3) Linux is the only OS currently supported by this repo. Additional OS support is available through the Llama-Recipes repo."
},
{
"question": "I am getting 'Issue with the URL' as an error message. What should I do?",
"answer": "This issue occurs because of not copying the URL correctly. If you right click on the link and copy the link, the link may be copied with URL Defense wrapper. To avoid this issue, select the URL manually and copy it."
},
{
"question": "Does Llama 2 support other languages outside of English?",
"answer": "The model was primarily trained on English with a bit of additional data from 27 other languages (for more information, see Table 10 on page 20 of the Llama 2 paper). We do not expect the same level of performance in these languages as in English. You’ll find the full list of languages referenced in the research paper. You can look at some of the community lead projects to fine-tune Llama 2 models to support other languages. (eg: link)"
},
{
"question": "If I’m a developer/business, how can I access the models?",
"answer": "Details on how to access the models are available on our website link. Please note that the models are subject to the acceptable use policy and the provided responsible use guide. Models are available through multiple sources but the place to start is at https://llama.meta.com/ Model code, quickstart guide and fine-tuning examples are available through our Github Llama repository. Model Weights are available through an email link after the user submits a sign-up form. Models are also being hosted by Microsoft, Amazon Web Services, and Hugging Face, and may also be available through other hosting providers in the future."
},
{
"question": "Can anyone access Llama models? What are the terms?",
"answer": "Llama models are broadly available to developers and licensees through a variety of hosting providers and on the Meta website and licensed under the applicable Llama Community License Agreement, which provides a permissive license to the models along with certain restrictions to help ensure that the models are being used responsibly."
},
{
"question": "What are the hardware SKU requirements for deploying these models?",
"answer": "Hardware requirements vary based on latency, throughput and cost constraints. For good latency, we split models across multiple GPUs with tensor parallelism in a machine with NVIDIA A100s or H100s. But TPUs, other types of GPUs, or even commodity hardware can also be used to deploy these models (e.g. llama cpp, MLC LLM)."
},
{
"question": "Do Llama models provide traditional autoregressive text completion?",
"answer": "Llama models are auto-regressive language models, built on the transformer architecture. The core language models function by taking a sequence of words as input and predicting the next word, recursively generating text."
},
{
"question": "Does the model support fill-in-the-middle completion, e.g. allowing the user to specify a suffix string for the response?",
"answer": "The vanilla model of Llama does not, however, the Code Llama models have been trained with fill-in-the-middle completion to assist with tasks like code completion."
},
{
"question": "Do Llama models support logit biases as a request parameter to control token probabilities during sampling?",
"answer": "This is implementation dependent (i.e. the code used to run the model)."
},
{
"question": "Do Llama models support adjusting sampling temperature or top-p threshold via request parameters?",
"answer": "The model itself supports these parameters, but whether they are exposed or not depends on implementation."
},
{
"question": "What is the most effective RAG method paired with LIama models?",
"answer": "There are many ways to use RAG with Llama. The most popular libraries are LangChain and LlamaIndex, and many of our developers have used them successfully with Llama 2. (See the LangChain and LlamaIndex sections of this document)."
},
{
"question": "How to set up Llama models with an EC2 instance?",
"answer": "You can find steps on how to set up an EC2 instance in the AWS section of this document here."
},
{
"question": "What is the right size of EC2 instances needed for running each of the llama models?",
"answer": "The AWS section of this document has some insights on instance size that you can start with. You can find the section here."
},
{
"question": "Should we start training with the base or instruct/chat model?",
"answer": "This depends on your application. The Llama pre-trained models were trained for general large language applications, whereas the Llama instruct or chat models were fine tuned for dialogue specific uses like chat bots."
},
{
"question": "I keep getting a 'CUDA out of memory' error.",
"answer": "This error can be caused by a number of different factors including, model size being too large, in-efficient memory usage and so on. Some of the steps below have been known to help with this issue, but you might need to do some troubleshooting to figure out the exact cause of your issue. 1. Ensure your GPU has enough memory 2. Reduce the batch_size 3. Lower the Precision 4. Clear cache 5. Modify the Model/Training"
},
{
"question": "Retrieval approach adds latency due to multiple calls at each turn. How to best leverage Llama+Retrieval?",
"answer": "If multiple calls are necessary then you could look into the following: 1. Optimize inference so each call has less latency. 2. Merge the calls into fewer calls. For example summarize the data and utilize the summary. 3. Possibly utilize Llama 2 function calling. 4. Consider fine-tuning the model with the updated data."
},
{
"question": "How can I fine tune the Llama models?",
"answer": "You can find examples on how to fine tune the Llama models in the Llama Recipes repository."
},
{
"question": "How can I pretrain the Llama models?",
"answer": "You can adapt the finetuning script found here for pre-training. You can also find the hyperparams used for pretraining in Section 2 of the Llama2 paper."
},
{
"question": "Am I allowed to develop derivative models through fine-tuning based on Llama models for languages other than english? Is this a violation of the acceptable use policy?",
"answer": "Developers may fine-tune Llama models for languages beyond English provided they comply with the applicable Llama 3 License Agreement, Llama Community License Agreement and the Acceptable Use Policy."
},
{
"question": "How can someone reduce hallucinations with fine-tuned LIama models?",
"answer": "Although prompts cannot eliminate hallucinations completely, they can reduce it significantly. Using techniques like Chain-of-Thought, Instruction-Based, N-Shot, and Few-Shot can help depending on your application. Additionally, prompting the models to back up the responses by verifying with factual data sets or requesting the models to provide the source of information can help as well. Overall finetuning should also be helpful for reducing hallucination."
},
{
"question": "What are the hardware SKU requirements for fine-tuning Llama pre-trained models?",
"answer": "Fine-tuning requirements also vary based on amount of data, time to complete fine-tuning and cost constraints. To fine-tune these models we have generally used multiple NVIDIA A100 machines with data parallelism across nodes and a mix of data and tensor parallelism intra node. But using a single machine, or other GPU types are definitely possible (e.g. alpaca models are trained on a single RTX4090: (https://github.com/tloen/alpaca-lora)"
},
{
"question": "What Fine-tuning tasks would these models support?",
"answer": "The Lama 2 fine-tuned models were fine tuned for dialogue specific uses like chat bots."
},
{
"question": "Are there examples on how one can fine-tune the models?",
"answer": "You can find example fine-tuning scripts in the Github recipes repository. You can also review the fine-tuning section in this document."
},
{
"question": "What is the difference between a pre-trained and fine-tuned model?",
"answer": "The Llama pre-trained models were trained for general large language applications, whereas the Llama chat or instruct models were fine tuned for dialogue specific uses like chat bots."
},
{
"question": "How should we think about post processing (validate generated data) as a way to fine tune models?",
"answer": "Essentially having a truthful data on the specific application can be helpful to reduce the risk on a specific application. Also setting some sort of threshold such as prob>90% might be helpful to get more confidence in the output."
},
{
"question": "What are the different libraries that we recommend for fine tuning?",
"answer": "You can find some fine-tuning recommendations in the Github recipes repository as well as the fine-tuning section of this document."
},
{
"question": "How can we identify the right ‘r’ value for LORA method for a certain use-case?",
"answer": "The best approach would be to review the LoRA research paper for more information on the rankings, then reviewing similar implementations for other models and finally experimenting."
},
{
"question": "We hope to use prompt engineering as a lever to nudge behavior. Any pointers on enhancing instruction-following by fine-tuning small llama models?",
"answer": "Take a look at the Fine tuning section in our Getting started with Llama guide of this document for some pointers towards fine tuning."
},
{
"question": "Strategies to help models handle longer conversations?",
"answer": "You can find some helpful information towards this in the Prompting and LangChain sections of this document."
},
{
"question": "Are Llama models open source? What is the exact license these models are published under?",
"answer": "Llama models are licensed under a bespoke commercial license that balances open access to the models with responsibility and protections in place to help address potential misuse. Our license allows for broad commercial use, as well as for developers to create and redistribute additional work on top of Llama models. For more details, our licenses can be found at (https://llama.meta.com/license/) (Meta Llama 2) and (https://llama.meta.com/llama3/license/) (Meta Llama 3)."
},
{
"question": "Are there examples that help licensees better understand how “MAU” is defined?",
"answer": "'MAU' means 'monthly active users' that access or use your (and your affiliates’) products and services. Examples include users accessing an internet-based service and monthly users/customers of licensee’s hardware devices."
},
{
"question": "Does the Critical Infrastructure restriction in the acceptable use policy (AUP) prevent companies who have special critical infrastructure certification (e.g., a registered operator of “critical infrastructure” under the German BSI Act) from using Llama?",
"answer": "No, such companies are not prohibited when their usage of Llama is not related to the operation of critical infrastructure. Llama, however, may not be used in the operation of critical infrastructure by any company, regardless of government certifications."
}
]