ソースを参照

fixing eval template

Kai Wu 1 年間 前
コミット
dd4f1dfd7a

ファイルの差分が大きいため隠しています
+ 33 - 22
recipes/use_cases/end2end-recipes/chatbot/pipelines/README.md


+ 5 - 4
recipes/use_cases/end2end-recipes/chatbot/pipelines/eval_config.yaml

@@ -1,6 +1,7 @@
 eval_prompt_template: >
-  You are a AI assistant that skilled in answering questions related to Llama model.
-  Below is a question from a llama user, please answer it in {language}, make the answer as concise as possible, it should be at most 100 words.
+  You are a AI assistant that skilled in answering questions related to Llama language models,
+  which includes LLama, Llama2, Meta Llama3, Code Llama, Meta Llama Guard 1,	Meta Llama Guard 2,
+  Below is a question from a llama user, think step by step and then answer it in {language}, make the answer as concise as possible, it should be at most 100 words.
   Return the result with the template:
   [
     {{
@@ -9,9 +10,9 @@ eval_prompt_template: >
   }}
   ]
 judge_prompt_template: >
-  You are provided with a question, a teacher answer and a student answer. Given that question, you need to score the how good the student answer is compare to
+  You are provided with a question, a teacher's answer and a student's answer. Given that question, you need to score the how good the student answer is compare to
   the teacher's answer. If the student's answer is correct based on the teacher's answer, then return YES. If the answer is not faithful, then return NO
-  and explain which part of the answer if not faithful in the Reason section.
+  and explain which part of the student's answer is not faithful in the Reason section.
   Return the result in json format with the template:
     {{
       "Reason": "your reason here.",

+ 3 - 3
recipes/use_cases/end2end-recipes/chatbot/pipelines/generation_config.yaml

@@ -5,12 +5,12 @@ question_prompt_template: >
   which includes LLama, Llama2, Meta Llama3, Code Llama, Meta Llama Guard 1,	Meta Llama Guard 2,
   then extract the context that is related to the question and answer, preferably using the sentences from original text,
   please make sure you follow those rules:
-  1. Generate {num_questions} question answer pairs.
+  1. Generate {num_questions} question answer pairs, you can generate less answer if there is nothing related to model, training, fine-tuning and evaluation details of Llama language models, .
   2. For each question and answer pair, add the context that is related to the question and answer, preferably using the sentences from original text
   3. Generate in {language}.
   4. The questions can be answered based *solely* on the given passage.
   5. Avoid asking questions with similar meaning.
-  6. Make the answer as concise as possible, it should be at most 80 words.
+  6. Make the answer as concise as possible, it should be at most 100 words.
   7. Provide relevant links from the document to support the answer.
   8. Never use any abbreviation.
   9. Return the result in json format with the template:
@@ -30,7 +30,7 @@ question_prompt_template: >
 curation_prompt_template: >
   Below is a question and answer pair (QA pair) and its related context about Llama language models,
   which includes LLama, Llama2, Meta Llama3, Code Llama, Meta Llama Guard 1,	Meta Llama Guard 2.
-  Given the context, evaluate whether or not this qusestion and answer pair will be helpful for a user of Llama language models,
+  Given the context, evaluate whether or not this qusestion and answer pair is related to Llama language models, including model, training, fine-tuning and evaluation details,
   and whether this question and answer is relevant to the context.
   Note that the answer in the QA pair can be the same or similar as the context, as repetition of context is allowed.
   Respond with only a single JSON blob with an "Reason" field that is a short (less than 100 word)

+ 3 - 0
recipes/use_cases/end2end-recipes/chatbot/pipelines/generator_utils.py

@@ -78,6 +78,9 @@ def read_file_content(context):
             if file_text:
                 file_strings.append(file_text)
     text = '\n'.join(file_strings)
+    text = remove_non_printable(text)
+    with open(context['data_dir'] + '/' + 'all_text.txt', 'w') as f:
+        f.write(text)
     return remove_non_printable(text)
 # clean the text by removing all parts that did not contain any alphanumeric characters
 def clean(s):