1 éve · 387fe503ce
--- a/tools/benchmarks/llm_eval_harness/meta_eval_reproduce/README.md
+++ b/tools/benchmarks/llm_eval_harness/meta_eval_reproduce/README.md
@@ -23,7 +23,7 @@ Given those differences, our reproduced number can not be compared to the number
 
				 
			
 
				 ## Environment setups
			
 
				 
			
 
				-Please install our lm-evaluation-harness and llama-recipe repo by following:
			
 
				+Please install lm-evaluation-harness and our llama-recipe repo by following:
			
 
				 
			
 
				 ```
			
 
				 pip install lm-eval[math,ifeval,sentencepiece,vllm]==0.4.3
			
@@ -83,7 +83,7 @@ data_parallel_size: 4 # The VLLM argument that speicify the data parallel size f
 
				 python prepare_meta_eval.py --config_path ./eval_config.yaml
			
 
				 ```
			
 
				 
			
 
				-  By default,this will load the default [eval_config.yaml](./eval_config.yaml) config and print out a CLI command to run `meta_instruct` group tasks,  which includes `meta_ifeval`, `meta_math_hard`, `meta_gpqa` and `meta_mmlu_pro_instruct`, for `meta-llama/Meta-Llama-3.1-8B-Instruct` model using `meta-llama/Meta-Llama-3.1-8B-Instruct-evals` dataset and `lm_eval`.
			
 
				+  This script will load the default [eval_config.yaml](./eval_config.yaml) config and print out a `lm_eval` command to run `meta_instruct` group tasks,  which includes `meta_ifeval`, `meta_math_hard`, `meta_gpqa` and `meta_mmlu_pro_instruct`, for `meta-llama/Meta-Llama-3.1-8B-Instruct` model using `meta-llama/Meta-Llama-3.1-8B-Instruct-evals` dataset.
			
 
				 
			
 
				   An example output from [prepare_meta_eval.py](./prepare_meta_eval.py) looks like this: