Kai Wu 7 ماه پیش
والد
کامیت
3985d0732a

+ 0 - 2
recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py

@@ -19,9 +19,7 @@ def replace_target(target,seq):
             seq[i],seq[i+1],seq[i+2] = -100,-100,-100
     return seq
 def tokenize_dialogs(dialogs, images, processor):
-    # If vocab size is above 128000, use the chat template to generate the tokens as it is from Llama 3 family models
     text_prompt = processor.apply_chat_template(dialogs)
-    #print("text_prompt",text_prompt)
     batch = processor(images=images, text=text_prompt,padding = True, return_tensors="pt")
     label_list = []
     for i in range(len(batch["input_ids"])):

+ 4 - 2
recipes/quickstart/finetuning/finetune_vision_model.md

@@ -1,5 +1,7 @@
 ## Fine-Tuning Meta Llama Multi Modal Models recipe
-This recipe steps you through how to finetune a Llama 3.2 vision model on the VQA task using the [OCRVQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/viewer/ocrvqa?row=0) dataset.
+This recipe steps you through how to finetune a Llama 3.2 vision model on the OCR VQA task using the [OCRVQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/viewer/ocrvqa?row=0) dataset.
+
+**Disclaimer** As our vision models already have a very good OCR ability, here we just use the OCRVQA dataset to demonstrate the steps needed for fine-tuning our vision models.
 
 ### Fine-tuning steps
 
@@ -20,7 +22,7 @@ For **LoRA finetuning with FSDP**, we can run the following code:
 
 For more details about the finetuning configurations, please read the [finetuning readme](./README.md).
 
-### How to use custom dataset to fine-tune vision model
+### How to use a custom dataset to fine-tune vision model
 
 In order to use a custom dataset, please follow the steps below: