|
|
@@ -3,6 +3,8 @@
|
|
|
This recipe is step by step guide to improve Llama performance on Text2SQL measured with the popular [BIRD](https://bird-bench.github.io) benchmark. We generate a synthetic Chain of Thought(CoT) dataset and fine-tune Llama models on it.
|
|
|
|
|
|
Results:
|
|
|
+
|
|
|
+| Fine-tuning Combination | Accuracy |
|
|
|
|-----------------------------|-------------------------------|
|
|
|
| baseline | 39.47% |
|
|
|
| CoT, PEFT | 43.35% |
|
|
|
@@ -11,20 +13,20 @@ Results:
|
|
|
|
|
|
The complete steps are:
|
|
|
|
|
|
-1. Pre-processing the BIRD TRAIN datset by converting SQL statements into the conversation format.
|
|
|
+1. Pre-processing the [BIRD](https://bird-bench.github.io) TRAIN datset by converting text, schema, external knowledge, and SQL statements into the conversation format.
|
|
|
|
|
|
-2. We use the conversations from step 1, add CoT to these existing conversations using Llama-3.3-70B.
|
|
|
+2. Using Llama-3.3-70B to add CoT to the conversation format dataset.
|
|
|
|
|
|
-3. Fine-tuning Llama-3.1-8B on the dataset from step 2.
|
|
|
+3. Fine-tuning Llama-3.1-8B on the CoT dataset from step 2.
|
|
|
|
|
|
-4. We provide scripts to simplify running the [BIRD](https://bird-bench.github.io) eval benchmark on the fine-tuned models and compare it with out of the model.
|
|
|
+4. Running the BIRD DEV eval benchmark on the fine-tuned models and compare it with out of the model.
|
|
|
|
|
|
## Folder Structure
|
|
|
|
|
|
- quickstart folder: contains a notebook to ask Llama 3.3 to convert natural language queries into SQL queries.
|
|
|
- data folder: contains scripts to download the BIRD TRAIN and DEV datasets;
|
|
|
-- fine-tune folder: contains scripts to generate non-CoT and CoT datasets based on the BIRD TRAIN set and to supervised fine-tune Llama models using the datasets, with different SFT options (quantization or not, full fine-tuning or parameter-efficient fine-tuning);
|
|
|
-- eval folder: contains scripts to evaluate Llama models (original and fine-tuned) on the BIRD dataset;
|
|
|
+- fine-tune folder: contains scripts to generate CoT dataset based on the BIRD TRAIN set and to supervised fine-tune Llama models using the dataset, with different SFT options (quantization or not, full fine-tuning or parameter-efficient fine-tuning);
|
|
|
+- eval folder: contains scripts to evaluate Llama models (original and fine-tuned) on the BIRD dataset.
|
|
|
|
|
|
We also experimented with supervised fine-tuning (SFT) without CoT which resulted in slightly lower accuracy.
|
|
|
|