1 year ago · 8967fa87b9
--- a/recipes/3p_integrations/modal/many-llamas-human-eval/README.md
+++ b/recipes/3p_integrations/modal/many-llamas-human-eval/README.md
@@ -1,20 +1,18 @@
 
				-See `rune2e.sh` for info on how to run the experiment.
			
 
				-
			
 
				-# Many Llamas Human Eval
			
 
				+# Many-Llamas Human-Eval
			
 
				 
			
 
				 In this directory, we run an experiment answering the question:
			
 
				 
			
 
				 *If we run enough Llama models in parallel, can they outperform GPT-4o on HumanEval?*
			
 
				 
			
 
				-It seeks to increase model performance not by scaling parameters, but by scaling compute time.
			
 
				+It seeks to increase model performance not through scaling parameters, but by scaling compute time.
			
 
				 
			
 
				 ### Technical Blog
			
 
				 
			
 
				-This experiment has been built and run by the team at [Modal](https://modal.com), and is described in the following blog post:
			
 
				+This experiment built by the team at [Modal](https://modal.com), and is described in the following blog post:
			
 
				 
			
 
				 [Beat GPT-4o at Python by searching with 100 dumb LLaMAs](https://modal.com/blog/llama-human-eval)
			
 
				 
			
 
				-The experiment has since been adapted to use the [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model, and run end-to-end using the Modal serverless platform.
			
 
				+The experiment has since been upgraded to use the [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model, and runnable end-to-end using the Modal serverless platform.
			
 
				 
			
 
				 ## Run it yourself
			
 
				 
			
@@ -34,6 +32,12 @@ That's all!
 
				 
			
 
				 This CLI will execute your modal apps, which build and run containers on the cloud, on your GPU of choice.
			
 
				 
			
 
				+### HuggingFace Pull Access
			
 
				+
			
 
				+To download the model, you'll first need to accept the [Llama 3.2 License](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on HuggingFace and be approved for access.
			
 
				+
			
 
				+Then, create a [modal secret](https://modal.com/secrets) named `huggingface`, to which you'll add your HF_TOKEN as an environment variable.
			
 
				+
			
 
				 ### Run The Experiment
			
 
				 
			
 
				 This command will run every step for you:
			
@@ -58,7 +62,10 @@ The resulting plots of the evals will be saved locally to:
 
				 
			
 
				 `/tmp/plot-pass-k.jpeg` shows pass@k for the Llama 3.2 3B Instruct model vs pass@1 for GPT-4o. 
			
 
				 
			
 
				+![plot-pass-k](https://github.com/user-attachments/assets/11e9dc6e-4322-4d44-b928-4ed7c4ce8262)
			
 
				+
			
 
				 You'll see that at 100 generations, the Llama model is able to perform on-par with GPT-4o. At higher scale, the Llama model will outperform GPT-4o.
			
 
				 
			
 
				 `/tmp/plot-fail-k.jpeg` shows fail@k across a log-scale, showing smooth scaling of this method.
			
 
				 
			
 
				+![plot-fail-k](https://github.com/user-attachments/assets/7286e4ff-5090-4288-bd62-8a078c6dc5a1)