Browse Source

Update README.md

Erik Dunteman 7 months ago
parent
commit
8967fa87b9
1 changed files with 13 additions and 6 deletions
  1. 13 6
      recipes/3p_integrations/modal/many-llamas-human-eval/README.md

+ 13 - 6
recipes/3p_integrations/modal/many-llamas-human-eval/README.md

@@ -1,20 +1,18 @@
-See `rune2e.sh` for info on how to run the experiment.
-
-# Many Llamas Human Eval
+# Many-Llamas Human-Eval
 
 
 In this directory, we run an experiment answering the question:
 In this directory, we run an experiment answering the question:
 
 
 *If we run enough Llama models in parallel, can they outperform GPT-4o on HumanEval?*
 *If we run enough Llama models in parallel, can they outperform GPT-4o on HumanEval?*
 
 
-It seeks to increase model performance not by scaling parameters, but by scaling compute time.
+It seeks to increase model performance not through scaling parameters, but by scaling compute time.
 
 
 ### Technical Blog
 ### Technical Blog
 
 
-This experiment has been built and run by the team at [Modal](https://modal.com), and is described in the following blog post:
+This experiment built by the team at [Modal](https://modal.com), and is described in the following blog post:
 
 
 [Beat GPT-4o at Python by searching with 100 dumb LLaMAs](https://modal.com/blog/llama-human-eval)
 [Beat GPT-4o at Python by searching with 100 dumb LLaMAs](https://modal.com/blog/llama-human-eval)
 
 
-The experiment has since been adapted to use the [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model, and run end-to-end using the Modal serverless platform.
+The experiment has since been upgraded to use the [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model, and runnable end-to-end using the Modal serverless platform.
 
 
 ## Run it yourself
 ## Run it yourself
 
 
@@ -34,6 +32,12 @@ That's all!
 
 
 This CLI will execute your modal apps, which build and run containers on the cloud, on your GPU of choice.
 This CLI will execute your modal apps, which build and run containers on the cloud, on your GPU of choice.
 
 
+### HuggingFace Pull Access
+
+To download the model, you'll first need to accept the [Llama 3.2 License](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on HuggingFace and be approved for access.
+
+Then, create a [modal secret](https://modal.com/secrets) named `huggingface`, to which you'll add your HF_TOKEN as an environment variable.
+
 ### Run The Experiment
 ### Run The Experiment
 
 
 This command will run every step for you:
 This command will run every step for you:
@@ -58,7 +62,10 @@ The resulting plots of the evals will be saved locally to:
 
 
 `/tmp/plot-pass-k.jpeg` shows pass@k for the Llama 3.2 3B Instruct model vs pass@1 for GPT-4o. 
 `/tmp/plot-pass-k.jpeg` shows pass@k for the Llama 3.2 3B Instruct model vs pass@1 for GPT-4o. 
 
 
+![plot-pass-k](https://github.com/user-attachments/assets/11e9dc6e-4322-4d44-b928-4ed7c4ce8262)
+
 You'll see that at 100 generations, the Llama model is able to perform on-par with GPT-4o. At higher scale, the Llama model will outperform GPT-4o.
 You'll see that at 100 generations, the Llama model is able to perform on-par with GPT-4o. At higher scale, the Llama model will outperform GPT-4o.
 
 
 `/tmp/plot-fail-k.jpeg` shows fail@k across a log-scale, showing smooth scaling of this method.
 `/tmp/plot-fail-k.jpeg` shows fail@k across a log-scale, showing smooth scaling of this method.
 
 
+![plot-fail-k](https://github.com/user-attachments/assets/7286e4ff-5090-4288-bd62-8a078c6dc5a1)