|
|
@@ -52,11 +52,11 @@ sh download_dev_unzip.sh
|
|
|
|
|
|
3. Run the evaluation script `sh llama_eval.sh`, which will use the BIRD DEV dataset (1534 examples in total) with external knowledge turned on to run the Llama model on each text question and compare the generated SQL with the gold SQL.
|
|
|
|
|
|
-*Note:* If your API key or model name is incorrect, the script will exit with an authentication or model not supported error.
|
|
|
+If your API key or model name is incorrect, the script will exit with an authentication or model not supported error.
|
|
|
|
|
|
After the script completes, you'll see the accuracy of the Llama model on the BIRD DEV text2sql. For example, the total accuracy is about 54.24% with `YOUR_API_KEY` set to your Llama API key and `model='Llama-3.3-70B-Instruct'`, or about 35.07% with `YOUR_API_KEY` set to your Together API key and `model=meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`.
|
|
|
|
|
|
-*Note:* To compare your evaluated accuracy of your selected Llama model with other results in the BIRD Dev leaderboard, click [here](https://bird-bench.github.io/).
|
|
|
+To compare your evaluated accuracy of your selected Llama model with other results in the BIRD Dev leaderboard, click [here](https://bird-bench.github.io/).
|
|
|
|
|
|
### Evaluation Process
|
|
|
|