Jeff Tang 3 ماه پیش
والد
کامیت
27a23afdd3
1فایلهای تغییر یافته به همراه8 افزوده شده و 6 حذف شده
  1. 8 6
      end-to-end-use-cases/coding/text2sql/README.md

+ 8 - 6
end-to-end-use-cases/coding/text2sql/README.md

@@ -3,6 +3,8 @@
 This recipe is step by step guide to improve Llama performance on Text2SQL measured with the popular [BIRD](https://bird-bench.github.io) benchmark. We generate a synthetic Chain of Thought(CoT) dataset and fine-tune Llama models on it.
 
 Results:
+
+| Fine-tuning Combination     | Accuracy                      |
 |-----------------------------|-------------------------------|
 | baseline                    | 39.47%                        |
 | CoT, PEFT                   | 43.35%                        |
@@ -11,20 +13,20 @@ Results:
 
 The complete steps are:
 
-1. Pre-processing the BIRD TRAIN datset by converting SQL statements into the conversation format.
+1. Pre-processing the [BIRD](https://bird-bench.github.io) TRAIN datset by converting text, schema, external knowledge, and SQL statements into the conversation format.
 
-2. We use the conversations from step 1, add CoT to these existing conversations using Llama-3.3-70B.
+2. Using Llama-3.3-70B to add CoT to the conversation format dataset.
 
-3. Fine-tuning Llama-3.1-8B on the dataset from step 2.
+3. Fine-tuning Llama-3.1-8B on the CoT dataset from step 2.
 
-4. We provide scripts to simplify running the [BIRD](https://bird-bench.github.io) eval benchmark on the fine-tuned models and compare it with out of the model.
+4. Running the BIRD DEV eval benchmark on the fine-tuned models and compare it with out of the model.
 
 ## Folder Structure
 
 - quickstart folder: contains a notebook to ask Llama 3.3 to convert natural language queries into SQL queries.
 - data folder: contains scripts to download the BIRD TRAIN and DEV datasets;
-- fine-tune folder: contains scripts to generate non-CoT and CoT datasets based on the BIRD TRAIN set and to supervised fine-tune Llama models using the datasets, with different SFT options (quantization or not, full fine-tuning or parameter-efficient fine-tuning);
-- eval folder: contains scripts to evaluate Llama models (original and fine-tuned) on the BIRD dataset;
+- fine-tune folder: contains scripts to generate CoT dataset based on the BIRD TRAIN set and to supervised fine-tune Llama models using the dataset, with different SFT options (quantization or not, full fine-tuning or parameter-efficient fine-tuning);
+- eval folder: contains scripts to evaluate Llama models (original and fine-tuned) on the BIRD dataset.
 
 We also experimented with supervised fine-tuning (SFT) without CoT which resulted in slightly lower accuracy.