raft.yaml 3.6 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
  1. COT_prompt_template: >
  2. <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful chatbot who can provide an answer to every questions from the user given a relevant context.<|eot_id|>
  3. <|start_header_id|>user<|end_header_id|>
  4. Question: {question}\nContext: {context}\n
  5. Answer this question using the information given by multiple documents in the context above. Here are the things to pay attention to:
  6. - The context contains many documents, each document starts with <DOCUMENT> and ends </DOCUMENT>.
  7. - First provide step-by-step reasoning on how to answer the question.
  8. - In the reasoning, if you need to copy paste some sentences from the context, include them in ##begin_quote## and ##end_quote##. This would mean that things outside of ##begin_quote## and ##end_quote## are not directly copy paste from the context.
  9. - End your response with final answer in the form <ANSWER>: $answer, the answer should less than 60 words.
  10. You MUST begin your final answer with the tag "<ANSWER>:". <|eot_id|><|start_header_id|>assistant<|end_header_id|>
  11. question_prompt_template: >
  12. <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a synthetic question-answer pair generator. Given a chunk of context about
  13. some topic(s), generate {num_questions} example questions a user could ask and would be answered
  14. using information from the chunk. For example, if the given context was a Wikipedia
  15. paragraph about the United States, an example question could be 'How many states are
  16. in the United States?
  17. Your questions should be formulated in the same style as questions that users could ask in a search engine.
  18. This means that your questions MUST NOT mention something like "according to the passage" or "context".
  19. The questions should be able to be answered in 60 words or less. Include only the questions in your response.<|eot_id|>
  20. <|start_header_id|>user<|end_header_id|>
  21. Context: {context}\n <|eot_id|><|start_header_id|>assistant<|end_header_id|>
  22. # question_prompt_template: >
  23. # <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a language model skilled in creating quiz questions.
  24. # You will be provided with a document,
  25. # read it and please generate factoid question and answer pairs that are most likely be asked by a user of Llama language models
  26. # which includes LLama, Llama2, Meta Llama3, Code Llama, Meta Llama Guard 1, Meta Llama Guard 2
  27. # Your factoid questions should be answerable with a specific, concise piece of factual information from the context.
  28. # Your factoid questions should be formulated in the same style as questions users could ask in a search engine.
  29. # This means that your factoid questions MUST NOT mention something like "according to the passage" or "context".
  30. # please make sure you follow those rules:
  31. # 1. Generate {num_questions} question answer pairs, you can generate less answer if there is nothing related to
  32. # model, training, fine-tuning and evaluation details of Llama language models,
  33. # 2. The questions can be answered based *solely* on the given passage.
  34. # 3. Avoid asking questions with similar meaning.
  35. # 4. Never use any abbreviation.
  36. # 5. The questions should be able to be answered in 60 words or less. Include only the questions in your response. <|eot_id|>
  37. # <|start_header_id|>user<|end_header_id|>
  38. # Context: {context}\n <|eot_id|><|start_header_id|>assistant<|end_header_id|>
  39. data_dir: "./data"
  40. xml_path: ""
  41. chunk_size: 1000
  42. questions_per_chunk: 5
  43. num_distract_docs: 4 # number of distracting documents to add to each chunk
  44. refusal_probability: 0.05 # probability of related documents to be added to each chunk