|
|
3 тижнів тому | |
|---|---|---|
| .. | ||
| README.md | 3 тижнів тому | |
| deepspeed_zero3.yaml | 1 місяць тому | |
| grpo-llama323b-text2sql.yaml | 4 тижнів тому | |
| grpo_text2sql.py | 3 тижнів тому | |
| requirements.txt | 4 тижнів тому | |
This folder contains scripts to reinforcemen fine-tuning Llama models for the Text2SQL task using GRPO.
git clone https://github.com/meta-llama/llama-cookbook
git checkout text2sql
cd llama-cookbook/end-to-end-use-cases/coding/text2sql/data
sh download_dev_unzip.sh
sh download_train_unzip.sh
pip install together
export TOGETHER_API_KEY=<your together.ai api key>
If you don't want to use the using LLM as a judge reward, you can comment out this line when calling GRPOTrainer and change the reward weights here to [1.0, 3.0, 1.0]
cd ../fine-tuning/grpo
pip install -r requirements.txt
--num_processes and --gpu_ids):accelerate launch --num_processes 6 --gpu_ids 2,3,4,5,6,7 --config_file deepspeed_zero3.yaml grpo_text2sql.py --config grpo-llama323b-text2sql.yaml
You can modify the grpo-llama323b-text2sql.yaml file and tune num_generations, learning_rate, reward_weights and other parameters.