Kai Wu 8 місяців тому
батько
коміт
2f0a006b68

+ 6 - 0
.github/scripts/spellcheck_conf/wordlist.txt

@@ -1451,3 +1451,9 @@ openhathi
 sarvam
 subtask
 acc
+BigBench
+IFEval
+MuSR
+Multistep
+multistep
+algorithmically

+ 1 - 1
tools/benchmarks/README.md

@@ -1,4 +1,4 @@
 # Benchmarks
 
 * inference - a folder contains benchmark scripts that apply a throughput analysis for Llama models inference on various backends including on-prem, cloud and on-device.
-* llm_eval_harness - a folder contains a tool to evaluate fine-tuned Llama models including quantized models focusing on quality.  
+* llm_eval_harness - a folder that introduces `lm-evaluation-harness`, a tool to evaluate Llama models including quantized models focusing on quality. We also included a recipe that reproduces Meta 3.1 evaluation metrics Using `lm-evaluation-harness` and instructions that reproduce HuggingFace Open LLM Leaderboard v2 metrics.

Різницю між файлами не показано, бо вона завелика
+ 3 - 3
tools/benchmarks/llm_eval_harness/README.md