Browse Source

Merge remote-tracking branch 'origin/main' into fix/unit_test_3.2

Matthias Reso 6 months ago
parent
commit
d58dea23e7

+ 1 - 1
recipes/quickstart/finetuning/finetune_vision_model.md

@@ -28,6 +28,6 @@ In order to use a custom dataset, please follow the steps below:
 
 
 1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder.
 1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder.
 2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the data loading.
 2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the data loading.
-3. In this python file, you need to define a `get_data_collator(processor)` class that returns a custom data collator that can be used by the Pytorch Data Loader.
+3. In this python file, you need to define a `get_data_collator(processor)` function that returns a custom data collator that can be used by the Pytorch Data Loader.
 4. This custom data collator class must have a `__call__(self, samples)` function that converts the image and text samples into the actual inputs that vision model expects.
 4. This custom data collator class must have a `__call__(self, samples)` function that converts the image and text samples into the actual inputs that vision model expects.
 5. Run the `torchrun` command from above section, please change the `--custom_dataset.file` to the new dataset python file, adjust the learning rate accordingly.
 5. Run the `torchrun` command from above section, please change the `--custom_dataset.file` to the new dataset python file, adjust the learning rate accordingly.

+ 1 - 1
recipes/use_cases/multilingual/README.md

@@ -1,7 +1,7 @@
 # Extending Llama to a new language
 # Extending Llama to a new language
 Authored by : Sarvam team
 Authored by : Sarvam team
 In this recipe, we will see how to add a new language to the Llama family of models. The steps are quite general and can be easily adapted to other models as well. Using this recipe, you should be able to replicate the findings of [OpenHathi](https://huggingface.co/sarvamai/OpenHathi-7B-Hi-v0.1-Base).
 In this recipe, we will see how to add a new language to the Llama family of models. The steps are quite general and can be easily adapted to other models as well. Using this recipe, you should be able to replicate the findings of [OpenHathi](https://huggingface.co/sarvamai/OpenHathi-7B-Hi-v0.1-Base).
-Please read more about OpenHathi [here](https://web.archive.org/web/20240418103408/https://www.sarvam.ai/blog/announcing-openhathi-series)
+Please read more about OpenHathi [here](https://x.com/SarvamAI/status/1734645628288831557)
 
 
 ## Data
 ## Data
 The original OpenHathi model uses a combination of [Sangraha](https://huggingface.co/datasets/ai4bharat/sangraha) and Wikipedia as its primary data sources. If the reader is interested in using these sources, they would also have to preprocess the data: clean, filter, and deduplicate. See [Setu](https://github.com/AI4Bharat/setu) for an easy way to do this at scale.
 The original OpenHathi model uses a combination of [Sangraha](https://huggingface.co/datasets/ai4bharat/sangraha) and Wikipedia as its primary data sources. If the reader is interested in using these sources, they would also have to preprocess the data: clean, filter, and deduplicate. See [Setu](https://github.com/AI4Bharat/setu) for an easy way to do this at scale.

+ 1 - 1
src/llama_recipes/utils/train_utils.py

@@ -151,11 +151,11 @@ def train(model, train_dataloader,eval_dataloader, tokenizer, optimizer, lr_sche
                                 batch[key] = batch[key].to('cuda:0')
                                 batch[key] = batch[key].to('cuda:0')
                     with autocast():
                     with autocast():
                         loss = model(**batch).loss
                         loss = model(**batch).loss
+                    total_loss += loss.detach().float()
                     loss = loss / gradient_accumulation_steps
                     loss = loss / gradient_accumulation_steps
                     if train_config.save_metrics:
                     if train_config.save_metrics:
                         train_step_loss.append(loss.detach().float().item())
                         train_step_loss.append(loss.detach().float().item())
                         train_step_perplexity.append(float(torch.exp(loss.detach().float())))
                         train_step_perplexity.append(float(torch.exp(loss.detach().float())))
-                    total_loss += loss.detach().float()
                     if train_config.use_fp16:
                     if train_config.use_fp16:
                         # if fp16 is enabled, use gradient scaler to handle gradient update
                         # if fp16 is enabled, use gradient scaler to handle gradient update
                         scaler.scale(loss).backward()
                         scaler.scale(loss).backward()