Browse Source

refactor folder

Kai Wu 8 months ago
parent
commit
307510b8c5
20 changed files with 19 additions and 14 deletions
  1. 3 0
      .github/scripts/spellcheck_conf/wordlist.txt
  2. 0 4
      requirements.txt
  3. 4 0
      tools/benchmarks/llm_eval_harness/README.md
  4. 12 10
      tools/benchmarks/meta_eval_reproduce/README.md
  5. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/eval_config.yaml
  6. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_eval.py
  7. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/bbh/bbh_3shot_cot.yaml
  8. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/bbh/utils.py
  9. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/gpqa_cot/gpqa_0shot_cot.yaml
  10. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/gpqa_cot/utils.py
  11. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/ifeval/ifeval.yaml
  12. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/ifeval/utils.py
  13. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/math_hard/math_hard_0shot_cot.yaml
  14. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/math_hard/utils.py
  15. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/meta_instruct.yaml
  16. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/meta_pretrain.yaml
  17. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/mmlu_pro/mmlu_pro_5shot_cot_instruct.yaml
  18. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/mmlu_pro/mmlu_pro_5shot_cot_pretrain.yaml
  19. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/mmlu_pro/utils.py
  20. 0 0
      tools/benchmarks/llm_eval_harness/meta_eval_reproduce/prepare_dataset.py

+ 3 - 0
.github/scripts/spellcheck_conf/wordlist.txt

@@ -1443,3 +1443,6 @@ ifeval
 lighteval
 sqrt
 wis
+evals
+mmlu
+parsers

+ 0 - 4
requirements.txt

@@ -29,7 +29,3 @@ langchain
 langchain_community
 sentence_transformers
 codeshield
-lm-eval==0.4.3
-immutabledict
-antlr4-python3-runtime==4.11
-nltk=3.8.1

File diff suppressed because it is too large
+ 4 - 0
tools/benchmarks/llm_eval_harness/README.md


File diff suppressed because it is too large
+ 12 - 10
tools/benchmarks/meta_eval_reproduce/README.md


tools/benchmarks/meta_eval_reproduce/eval_config.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/eval_config.yaml


tools/benchmarks/meta_eval_reproduce/meta_eval.py → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_eval.py


tools/benchmarks/meta_eval_reproduce/meta_template/bbh/bbh_3shot_cot.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/bbh/bbh_3shot_cot.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/bbh/utils.py → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/bbh/utils.py


tools/benchmarks/meta_eval_reproduce/meta_template/gpqa_cot/gpqa_0shot_cot.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/gpqa_cot/gpqa_0shot_cot.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/gpqa_cot/utils.py → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/gpqa_cot/utils.py


tools/benchmarks/meta_eval_reproduce/meta_template/ifeval/ifeval.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/ifeval/ifeval.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/ifeval/utils.py → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/ifeval/utils.py


tools/benchmarks/meta_eval_reproduce/meta_template/math_hard/math_hard_0shot_cot.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/math_hard/math_hard_0shot_cot.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/math_hard/utils.py → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/math_hard/utils.py


tools/benchmarks/meta_eval_reproduce/meta_template/meta_instruct.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/meta_instruct.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/meta_pretrain.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/meta_pretrain.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/mmlu_pro/mmlu_pro_5shot_cot_instruct.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/mmlu_pro/mmlu_pro_5shot_cot_instruct.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/mmlu_pro/mmlu_pro_5shot_cot_pretrain.yaml → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/mmlu_pro/mmlu_pro_5shot_cot_pretrain.yaml


tools/benchmarks/meta_eval_reproduce/meta_template/mmlu_pro/utils.py → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/meta_template/mmlu_pro/utils.py


tools/benchmarks/meta_eval_reproduce/prepare_dataset.py → tools/benchmarks/llm_eval_harness/meta_eval_reproduce/prepare_dataset.py