Quellcode durchsuchen

Adding all packages to pip install

Beto de Paola vor 3 Wochen
Ursprung
Commit
d33139246a
1 geänderte Dateien mit 24 neuen und 13 gelöschten Zeilen
  1. 24 13
      getting-started/finetuning/vision/README.md

+ 24 - 13
getting-started/finetuning/vision/README.md

@@ -1,15 +1,15 @@
-# Fine-tuning Llama 3.2 11B Vision for Structured Document Extraction
+# Fine-tuning Llama 3.2 11B Vision for Structured Data Extraction
 
-This recipe demonstrates how to fine-tune Llama 3.2 11B Vision model on a synthetic W-2 tax form dataset for structured information extraction. The tutorial compares LoRA (Low-Rank Adaptation) and full parameter fine-tuning approaches, evaluating their trade-offs in terms of accuracy, memory consumption, and computational requirements.
+This recipe demonstrates how to fine-tune Llama 3.2 11B Vision model on a synthetic W-2 tax form dataset for structured data extraction. The tutorial compares LoRA (Low-Rank Adaptation) and full parameter fine-tuning approaches, evaluating their trade-offs in terms of accuracy, memory consumption, and computational requirements.
 
 ## Objectives
-- Showcase how to fine-tune and evaluate on a specific document extraction use case
+- Showcase how to fine-tune and evaluate on a specific extraction use case
 - Demonstrate custom benchmarking for structured output tasks
 - Compare trade-offs between LoRA and Full Parameter Fine-tuning on both task-specific and general benchmarks
 - Provide guidance on data preparation, training configuration, and evaluation methodologies
 
 ## Prerequisites
-- CUDA-compatible GPU with at least 40GB VRAM (H100 recommended)
+- CUDA-compatible GPU with at least 40GB VRAM
 - HuggingFace account with access to Llama models
 - Python 3.10+
 
@@ -19,13 +19,13 @@ This recipe demonstrates how to fine-tune Llama 3.2 11B Vision model on a synthe
 ```bash
 git clone git@github.com:meta-llama/llama-cookbook.git
 cd llama-cookbook/getting-started/finetuning/vision
-conda create -n image-ft python=3.10 -y
+conda create -n image-ft python=3.12 -y
 conda activate image-ft
 ```
 
 ### Dependencies Installation
 ```bash
-pip install -r requirements.txt
+pip install torchtune torch torchvision torchaudio torchao bitsandbytes transformers==4.51.1 accelerate vllm==0.9.2 lm_eval wandb
 ```
 
 Install torchtune nightly for the latest vision model support:
@@ -45,7 +45,7 @@ The dataset contains 2,000 examples of synthetic W-2 forms with three splits: tr
 The preparation script:
 1. Reshuffles the train/test splits according to the specified ratio
 2. Removes unnecessary JSON structure wrappers from ground truth
-3. Adds standardized prompts for training consistency
+3. Adds standardized prompts for training
 
 ```bash
 python prepare_w2_dataset.py --train-ratio 0.3
@@ -83,7 +83,7 @@ CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model
 
 ### Run Baseline Evaluation
 ```bash
-python evaluate.py --server_url http://localhost:8001 --model Llama-3.2-11B-Vision-Instruct/ --structured --dataset fake_w2_us_tax_form_dataset_train30_test70/test --limit 200
+python evaluate.py --server_url http://localhost:8001/v1 --model Llama-3.2-11B-Vision-Instruct/ --structured --dataset fake_w2_us_tax_form_dataset_train30_test70/test --limit 100 --max_workers 25
 ```
 
 ## Fine-tuning
@@ -141,17 +141,26 @@ Start a vLLM server with your fine-tuned model:
 
 **For LoRA model:**
 ```bash
-CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model ./outputs/Llama-3.2-11B-Instruct-w2-lora/epoch_4/ --port 8003 --max-model-len 128000 --tensor-parallel-size 2
+CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model ./outputs/Llama-3.2-11B-Instruct-w2-lora/epoch_4/ --port 8001 --max-model-len 65000 --tensor-parallel-size 2 --max-num-seqs 10
 ```
 
 **For full fine-tuned model:**
 ```bash
-CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model ./outputs/Llama-3.2-11B-Instruct-w2-full/epoch_4/ --port 8003 --max-model-len 128000 --tensor-parallel-size 2
+CUDA_VISIBLE_DEVICES="0,1" python -m vllm.entrypoints.openai.api_server --model ./outputs/Llama-3.2-11B-Instruct-w2-full/epoch_4/ --port 8001 --max-model-len 65000 --tensor-parallel-size 2
 ```
 
 ### Task-Specific Evaluation
 ```bash
-python evaluate.py --server_url http://localhost:8003 --model <model_path> --structured --dataset fake_w2_us_tax_form_dataset_train30_test70/test --limit 200
+python evaluate.py --server_url http://localhost:8001/v1 --model ./outputs/Llama-3.2-11B-Instruct-w2-full/epoch_4/ --structured --dataset fake_w2_us_tax_form_dataset_train30_test70/test --limit 100 --max_workers 25
+```
+
+
+### Cloud Evaluation Setup
+
+To evaluate bigger models, like Llama 4 Maverick, we leverage cloud providers that server these models, like together.ai. Any OpenAI compatible provider should work out of the box with our evaluation script, as it uses the OpenAI SDK.
+
+```bash
+TOGETHER_API_KEY=<your_api_key> python evaluate.py --server_url https://api.together.xyz/v1 --model meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8  --structured --dataset fake_w2_us_tax_form_dataset_train30_test70/test --limit 100 --max_workers 25
 ```
 
 ### General Benchmark Evaluation
@@ -191,7 +200,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m lm_eval --model hf-multimodal
     --tasks chartqa_llama_90 \
     --batch_size 16 \
     --seed 4242 \
-    --log_samples
+    --log_samples \
+    --output_path results
+
 ```
 
 ## Results
@@ -348,7 +359,7 @@ You can benchmark against the Llama API for comparison:
 
 ```bash
 LLAMA_API_KEY="<your_api_key>" python evaluate.py \
-  --server_url https://api.llama.com/compat \
+  --server_url https://api.llama.com/compa/v1 \
   --limit 100 \
   --model Llama-4-Maverick-17B-128E-Instruct-FP8 \
   --structured \