|
6 jam lalu | |
---|---|---|
.. | ||
data | 3 minggu lalu | |
eval | 7 jam lalu | |
fine-tuning | 7 jam lalu | |
quickstart | 2 minggu lalu | |
README.md | 6 jam lalu |
This recipe is step by step guide to improve Llama performance on Text2SQL measured with the popular BIRD benchmark. We generate a synthetic Chain of Thought(CoT) dataset and fine-tune Llama models on it.
Results:
Fine-tuning Combination | Accuracy |
---|---|
baseline | 39.47% |
CoT, PEFT | 43.35% |
CoT, FFT | 42.44% (3 epochs) |
CoT, FFT | 43.87% (10 epochs) |
The complete steps are:
Pre-processing the BIRD TRAIN datset by converting text, schema, external knowledge, and SQL statements into the conversation format.
Using Llama-3.3-70B to add CoT to the conversation format dataset.
Fine-tuning Llama-3.1-8B on the CoT dataset from step 2.
Running the BIRD DEV eval benchmark on the fine-tuned models and compare it with out of the model.
We also experimented with supervised fine-tuning (SFT) without CoT which resulted in slightly lower accuracy.