소스 검색

minor fix

Kai Wu 8 달 전
부모
커밋
e354eee1fb
1개의 변경된 파일1개의 추가작업 그리고 1개의 파일을 삭제
  1. 1 1
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/README.md

+ 1 - 1
tools/benchmarks/llm_eval_harness/meta_eval_reproduce/README.md

@@ -11,7 +11,7 @@ As Meta Llama models gain popularity, evaluating these models has become increas
 
 ## Insights from Our Evaluation Process
 
-There are 4 major differences in terms of the eval configurations and prompting methods between this implementation and Hugging Face [leaderboard implementation](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/leaderboard).
+Here are our insights about the differences in terms of the eval configurations and prompting methods between this implementation and Hugging Face [leaderboard implementation](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/leaderboard).
 
 - **Prompts**: We use Chain-of-Thought(COT) prompts while Hugging Face leaderboard does not. The prompts that define the output format are also different.
 - **Metric calculation**: For MMLU-Pro, BBH, GPQA tasks, we ask the model to generate response and score the parsed answer from generated response, while Hugging Face leaderboard evaluation is comparing log likelihood of all label words, such as [ (A),(B),(C),(D) ].