|
|
1 ヶ月 前 | |
|---|---|---|
| .. | ||
| 11B_full_w2.yaml | 3 日 前 | |
| 11B_lora_w2.yaml | 3 日 前 | |
| README.md | 3 日 前 | |
| evaluate.py | 3 日 前 | |
| prepare_w2_dataset.py | 3 日 前 | |
This recipe demonstrates how to fine-tune Llama 3.2 11B Vision model on a synthetic W-2 tax form dataset for structured information extraction. The tutorial compares LoRA (Low-Rank Adaptation) and full parameter fine-tuning approaches, evaluating their trade-offs in terms of accuracy, memory consumption, and computational requirements.
git clone git@github.com:meta-llama/llama-cookbook.git
cd llama-cookbook/getting-started/finetuning/vision
conda create -n image-ft python=3.10 -y
conda activate image-ft
pip install -r requirements.txt
Install torchtune nightly for the latest vision model support:
pip install --pre --upgrade torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Important: Log in to your HuggingFace account to download the model and datasets:
huggingface-cli login
The dataset contains 2,000 examples of synthetic W-2 forms with three splits: train (1,800), test (100), and validation (100). For this use case, we found that fewer training examples (30% train, 70% test) provided sufficient improvement while allowing for more comprehensive evaluation.
The preparation script:
python prepare_w2_dataset.py --train-ratio 0.3
This creates a new dataset directory: fake_w2_us_tax_form_dataset_train30_test70
Configuration Note: If you change the train ratio, update the dataset.data_files.train path in the corresponding YAML configuration files.
Download the base Llama 3.2 11B Vision model:
tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir Llama-3.2-11B-Vision-Instruct
This downloads to the expected directory structure used in the provided YAML files. If you change the directory, update these keys in the configuration files:
checkpointer.checkpoint_dirtokenizer.pathBefore fine-tuning, establish a baseline by evaluating the pre-trained model on the test set.
For single GPU (H100):
CUDA_VISIBLE_DEVICES="0" python -m vllm.entrypoints.openai.api_server --model Llama-3.2-11B-Vision-Instruct/ --port 8001 --max-model-len 65000 --max-num-seqs 10
For multi-GPU setup:
CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model Llama-3.2-11B-Vision-Instruct/ --port 8001 --max-model-len 65000 --tensor-parallel-size 2 --max-num-seqs 10
python evaluate.py --server_url http://localhost:8001 --model Llama-3.2-11B-Vision-Instruct/ --structured --dataset fake_w2_us_tax_form_dataset_train30_test70/test --limit 200
The repository includes two pre-configured YAML files:
11B_full_w2.yaml: Full parameter fine-tuning configuration11B_lora_w2.yaml: LoRA fine-tuning configurationKey differences:
Full Parameter Fine-tuning:
LoRA Fine-tuning:
Before training, update the WandB entity in your YAML files:
metric_logger:
_component_: torchtune.training.metric_logging.WandBLogger
project: llama3_2_w2_extraction
entity: <your_wandb_entity> # Update this
Full Parameter Fine-tuning:
tune run full_finetune_single_device --config 11B_full_w2.yaml
LoRA Fine-tuning:
tune run lora_finetune_single_device --config 11B_lora_w2.yaml
Note: The VQA dataset component in torchtune is pre-configured to handle the multimodal format, eliminating the need for custom preprocessors.
Start a vLLM server with your fine-tuned model:
For LoRA model:
CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model ./outputs/Llama-3.2-11B-Instruct-w2-lora/epoch_4/ --port 8003 --max-model-len 128000 --tensor-parallel-size 2
For full fine-tuned model:
CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model ./outputs/Llama-3.2-11B-Instruct-w2-full/epoch_4/ --port 8003 --max-model-len 128000 --tensor-parallel-size 2
python evaluate.py --server_url http://localhost:8003 --model <model_path> --structured --dataset fake_w2_us_tax_form_dataset_train30_test70/test --limit 200
Install llama-verifications for standard benchmarks:
pip install llama-verifications
Run benchmark evaluation:
uvx llama-verifications run-benchmarks \
--benchmarks mmlu-pro-cot,gpqa,gpqa-cot-diamond \
--provider http://localhost:8003/v1 \
--model <model_path> \
--continue-on-failure \
--max-parallel-generations 100
For additional benchmarks using lm-eval:
With vLLM backend:
CUDA_VISIBLE_DEVICES=0,1 lm_eval --model vllm \
--model_args pretrained=<model_path>,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9 \
--tasks gsm8k_cot_llama \
--batch_size auto \
--seed 4242
With transformers backend:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m lm_eval --model hf-multimodal \
--model_args pretrained=<model_path> \
--tasks chartqa_llama_90 \
--batch_size 16 \
--seed 4242 \
--log_samples
| Benchmark | 11B bf16 (Baseline) | LoRA | FPFT int4 | FPFT | 90B bf16 |
|---|---|---|---|---|---|
| W2 extraction acc | 58 | 72 | 96 | 97 | N/A |
| Benchmark | 11B bf16 (Baseline) | LoRA | FPFT int4 | FPFT | 90B bf16 |
|---|---|---|---|---|---|
| bfclv3 | 39.87 | 39.87 | 34.67 | 39.85 | N/A |
| docvqa | 86.88 | 85.08 | 78.95 | 86.3 | N/A |
| gpqa-cot-diamond | 27.78 | 27.78 | 28 | 26 | N/A |
| ifeval | 74.79 | 74.78 | 74.42 | 74.54 | N/A |
| mmlu-pro-cot | 48.43 | 48.13 | 46.14 | 48.33 | N/A |
| Benchmark | 11B bf16 (Baseline) | LoRA | FPFT int4 | FPFT | 90B bf16 |
|---|---|---|---|---|---|
| gsm8k_cot_llama_strict | 85.29 | N/A | N/A | 85.29 | N/A |
| gsm8k_cot_llama_flexible | 85.44 | N/A | N/A | 85.44 | N/A |
| chartqa_llama_90_exact | 0 | N/A | N/A | 0 | 3.8 |
| chartqa_llama_90_relaxed | 34.16 | N/A | N/A | 35.58 | 44.12 |
| chartqa_llama_90_anywhere | 43.53 | N/A | N/A | 46.52 | 47.44 |
Note: Training loss curves and memory consumption graphs will be added here based on WandB logging data.
You can benchmark against the Llama API for comparison:
LLAMA_API_KEY="<your_api_key>" python evaluate.py \
--server_url https://api.llama.com/compat \
--limit 100 \
--model Llama-4-Maverick-17B-128E-Instruct-FP8 \
--structured \
--max_workers 50 \
--dataset fake_w2_us_tax_form_dataset_train30_test70/test
Data Preparation: Ensure your dataset format matches the expected structure. The preparation script handles common formatting issues.
Configuration Management: Always update paths in YAML files when changing directory structures.
Memory Management: Use PagedAdamW8bit optimizer for full parameter fine-tuning to reduce memory usage.
Evaluation Strategy: Evaluate both task-specific and general capabilities to understand trade-offs.
Monitoring: Use WandB for comprehensive training monitoring and comparison.
CUDA Out of Memory: Reduce batch size, enable gradient checkpointing, or use LoRA instead of full fine-tuning.
Dataset Path Errors: Verify that dataset paths in YAML files match your actual directory structure.
Model Download Issues: Ensure you're logged into HuggingFace and have access to Llama models.
vLLM Server Connection: Check that the server is running and accessible on the specified port.