# TLT YOLO v4  Object Detection
---
## Learning Objectives
This notebook shows an example usecase of YOLO v4 object detection using Transfer Learning Toolkit. You will learn how to leverage the simplicity and convenience of TLT to:

* Train model with resen using KITTI detection dataset
* Prune the finetuned model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Run Inference on the trained model
* Export the pruned and retrained model to a .etlt file for deployment to DeepStream

### Table of Contents

0. [Set up env variables](#head-0)
1. [Prepare dataset and pre-trained model](#head-1) <br>
    1.1 [Download pre-trained model](#head-1-1) <br>
2. [Provide training specification](#head-2)
3. [Run TLT training](#head-3)
4. [Evaluate trained models](#head-4)
5. [Prune trained models](#head-5)
6. [Retrain pruned models](#head-6)
7. [Evaluate retrained model](#head-7)
8. [Visualize inferences](#head-8)
9. [Deploy](#head-9)

# Transfer Learning with TLT
​
Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 
​
Transfer Learning Toolkit (TLT) is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.
​
<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/embedded-transfer-learning-toolkit-software-stack-1200x670px.png" width="720"> 

Before TLT can be use, you need to register at ngc.nvidia.com and proceed to generate an API Key. A step-by-step process to achieving this is given below:
- From your browser visit `ngc.nvidia.com`
- Click on `Welcome Guest` and you would see a dropdown menu and then click on `Sign In/Sign Up`.  
- Click on `continue` button where `NVIDIA Account (use existing or create a new NVIDIA ac-)` is written.
- Click on `Create account` and get registered. Thereafter you may proceed to login with your new account credentials.
- At the top right corner, click on your `username`, you would see a dropdown menu, then click on `Setup`.
- proceed and click on `Get API Key` button.
- Next, you would find at the top right corner a `Generate API Key` button, click on this button. A dialog box would appear after the click, you must click on the `confirm` button on it.
- Finally, copy your generated API Key and Username, and save it somewhere on your local system.

<img align="center" src="images/ngc_setup_key.PNG" width="600"> 
<img align="center" src="images/ngc_key.PNG" width="700">

## API Key

- Your API key represents your credentials
  - Used for programmatic interaction (e.g., docker, REST API, etc.)
  - Uniquely identifies you (think “Username & Password”)
  - There can be only one (regenerating your API key invalidates the old one)
- Programmatic interface at `nvcr.io`: Use API Key

## 0. Set up env variables <a class="anchor" id="head-0"></a>

Please copy your API Key from where you saved it and paste it within the empty single quote in front of `%env KEY=''`.


In [None]:
%set_env KEY='place your ngc api key here'
%set_env GPU_INDEX=0
%set_env USER_EXPERIMENT_DIR=/workspace/tlt-experiments/yolo_v4
%set_env DATA_DOWNLOAD_DIR=/workspace/tlt-experiments/data
%set_env SPECS_DIR=/workspace/examples/yolo_v4/specs
!mkdir -p $DATA_DOWNLOAD_DIR

## 1. Prepare dataset and pre-trained model <a class="anchor" id="head-1"></a>

 We will be using the KITTI detection dataset for the tutorial. To find more details please visit
 http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. If you intend to run this notebook on your local workstation without using a container, please download the KITTI detection images (http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and labels (http://www.cvlibs.net/download.php?file=data_object_label_2.zip) to $DATA_DOWNLOAD_DIR or `workspace/tlt-experiments/data`.

-  Check the dataset is present

In [None]:
!mkdir -p $DATA_DOWNLOAD_DIR
!if [ ! -f $DATA_DOWNLOAD_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $DATA_DOWNLOAD_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi

- Unpack the zip files and Verify

In [None]:
!unzip -u $DATA_DOWNLOAD_DIR/data_object_image_2.zip -d $DATA_DOWNLOAD_DIR
!unzip -u $DATA_DOWNLOAD_DIR/data_object_label_2.zip -d $DATA_DOWNLOAD_DIR

In [None]:
!ls -l $DATA_DOWNLOAD_DIR/

- Generate validation set out of training dataset

In [None]:
!python /workspace/tlt-experiments/source_code/generate_val_dataset.py --input_image_dir=$DATA_DOWNLOAD_DIR/training/image_2 \
                                       --input_label_dir=$DATA_DOWNLOAD_DIR/training/label_2 \
                                       --output_dir=$DATA_DOWNLOAD_DIR/val

- Additionally, if you have your own dataset already in a volume (or folder), you can mount the volume on `DATA_DOWNLOAD_DIR` (or create a soft link). Below shows an example:

```bash
# if your dataset is in /dev/sdc1
mount /dev/sdc1 $DATA_DOWNLOAD_DIR

# if your dataset is in folder /var/dataset
ln -sf /var/dataset $DATA_DOWNLOAD_DIR
```

- You will also need to run the cell below to generate the best anchor shape
- The anchor shape generated by this script is sorted. `Write the first 3 into small_anchor_shape in the config file`, `write middle 3 into mid_anchor_shape`, and `write last 3 into big_anchor_shape`.

In [None]:
# !yolo_v4 kmeans -l $DATA_DOWNLOAD_DIR/training/label_2 \
#                 -i $DATA_DOWNLOAD_DIR/training/image_2 \
#                 -n 9 \
#                 -x 1248 \
#                 -y 384


### 1.1 Download pre-trained model <a class="anchor" id="head-1-1"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
!ngc registry model list nvidia/tlt_pretrained_object_detection:*

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/pretrained_resnet18/

- Pull pretrained model from NGC

In [None]:
!ngc registry model download-version nvidia/tlt_pretrained_object_detection:resnet18 --dest $USER_EXPERIMENT_DIR/pretrained_resnet18

- Check that model is downloaded into directory

In [None]:
!ls -l $USER_EXPERIMENT_DIR/pretrained_resnet18/tlt_pretrained_object_detection_vresnet18

## 2. Provide training specification <a class="anchor" id="head-2"></a>
* Augmentation parameters for on-the-fly:
    * training (hyper-)parameters such as batch size, number of epochs, learning rate etc.
* Training dataset
* Validation dataset
* Pre-trained models

- Provide pretrained model path on-the-fly by runing the cell below

In [None]:
!sed -i 's,EXPERIMENT_DIR,'"$USER_EXPERIMENT_DIR"',' $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt

- To enable QAT training on sample spec file, uncomment following lines in the cell below

In [None]:
# !sed -i "s/enable_qat: false/enable_qat: true/g" $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: false/enable_qat: true/g" $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt

- By default, the sample spec file `(yolo_v4_train_resnet18_kitti.txt)` disables QAT training. You can force non-QAT training by uncomment and run the cell below

In [None]:
# !sed -i "s/enable_qat: true/enable_qat: false/g" $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: true/enable_qat: false/g" $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt

- Run the cell below to view the model spec configuration file. **Your task would be to modify the hyper-parameters to achieve desirable accuracy result**. You can access the `yolo_v4_train_resnet18_kitti.txt` file in the `spec folder` seen at the top left-side of the jupyter lab. Please, remember to save the file with `ctl s` after modification and then rerun the cell below to see if your changes have reflected.
- Note that in the spec file `arch` is set to `resnet` as the backbone for feature extraction. Others include `"vgg", "darknet", "googlenet", "mobilenet_V1", "mobilenet_V2", "cspdarknet", and "squeeznet"`.  

In [None]:
!cat $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt

## 3. Run TLT training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete  
- Please note some parameter definition: 
     - -e: `spec file`; -k: `API key encoding`;  -r: `result directory`; --gpus: `number of GPU`
- To run with multigpu, please change `--gpus` based on the number of available GPUs in your machine

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
!yolo_v4 train -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
               -k $KEY \
               --gpus 2

- To resume from checkpoint, please change pretrain_model_path to resume_model_path in config file

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights

- Now check the evaluation stats in the `csv file` and pick the model with highest eval `accuracy`.

In [None]:
!cat $USER_EXPERIMENT_DIR/experiment_dir_unpruned/yolov4_training_log_resnet18.csv
%set_env EPOCH=080

## 4. Evaluate trained models <a class="anchor" id="head-4"></a>

In [None]:
!yolo_v4 evaluate -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                  -k $KEY

## 5. Prune trained models <a class="anchor" id="head-5"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.5` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!yolo_v4 prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
               -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
               -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/yolov4_resnet18_pruned.tlt \
               -eq intersection \
               -pth 0.1 \
               -k $KEY

In [None]:
!ls -rlt $USER_EXPERIMENT_DIR/experiment_dir_pruned/

## 6. Retrain pruned models <a class="anchor" id="head-6"></a>
* Model needs to be re-trained to bring back accuracy after pruning.
- Run the cell below to view the retrain specification configuration file. You may need to modify the hyper-parameters to achieve desirable accuracy result. You can access the `yolo_v4_retrain_resnet18_kitti.txt` file in the `specs folder` seen at the top left-side of the jupyter lab. Please, remember to save the file with `ctl s` after modification and then rerun the cell below to see if your changes have reflected.
- WARNING: training will take several hours or one day to complete.

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!sed -i 's,EXPERIMENT_DIR,'"$USER_EXPERIMENT_DIR"',' $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt
!cat $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights 
!yolo_v4 train --gpus 2 \
               -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
               -k $KEY

- Listing the newly retrained model

In [None]:
!ls -rlt $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights

- Now check the evaluation stats in the `csv file` and pick the model with highest eval `accuracy`.

In [None]:
!cat $USER_EXPERIMENT_DIR/experiment_dir_retrain/yolov4_training_log_resnet18.csv
%set_env EPOCH=080

## 7. Evaluate retrained model <a class="anchor" id="head-7"></a>

In [None]:
!yolo_v4 evaluate -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                  -k $KEY

## 8. Visualize inferences <a class="anchor" id="head-8"></a>
- In this section, we run the tlt-infer tool to generate inferences on the trained models and visualize the results.
- Please note some parameter definition:
   - -m:`retrained model;` -e:`retrain spec file;` -k: `encoding key;` -b: `batch size;` -i: `test data dir`; -o: `output images`; -l: `frame by frame bbox labels output`  

In [None]:
# Copy some test images
!mkdir -p /workspace/examples/yolo_v4/test_samples
!cp $DATA_DOWNLOAD_DIR/testing/image_2/00000* /workspace/examples/yolo_v4/test_samples/

In [None]:
# Running inference for detection on n images
!yolo_v4 inference -i /workspace/examples/yolo_v4/test_samples \
                   -o $USER_EXPERIMENT_DIR/yolo_infer_images \
                   -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                   -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                   -l $USER_EXPERIMENT_DIR/yolo_infer_labels \
                   -k $KEY

The `tlt-infer` tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/yolo_infer_images`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/yolo_infer_labels`

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['USER_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

- Visualize the sample images.

In [None]:
OUTPUT_PATH = 'yolo_infer_images' # relative path from $USER_EXPERIMENT_DIR.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 9. Deploy! <a class="anchor" id="head-9"></a>

If you trained a non-QAT model, you may export in `FP32`, `FP16` or `INT8` mode using the code block below. For `INT8`, you need to provide calibration image directory.

In [None]:
# tlt-export will fail if .etlt already exists. So we clear the export folder before tlt-export
!rm -rf $USER_EXPERIMENT_DIR/export
!mkdir -p $USER_EXPERIMENT_DIR/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                -k $KEY \
                -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt \
                -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                --batch_size 16 \
                --data_type fp32

# Uncomment to export in INT8 mode (generate calibration cache file). 
# !yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt  \
#                 -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt \
#                 -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
#                 -k $KEY \
#                 --cal_image_dir  $USER_EXPERIMENT_DIR/data/testing/image_2 \
#                 --data_type int8 \
#                 --batch_size 16 \
#                 --batches 10 \
#                 --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
#                 --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

`Note:` In this example, for ease of execution we restrict the number of calibrating batches to 10. TLT recommends the use of at least 10% of the training dataset for int8 calibration.

If you train a QAT model, you may only export in INT8 mode using following code block. This generates an etlt file and the corresponding calibration cache. You can throw away the calibration cache and just use the etlt file in tlt-converter or DeepStream for FP32 or FP16 mode. But please note this gives sub-optimal results. If you want to deploy in FP32 or FP16, you should disable QAT in training.

In [None]:
# Uncomment to export QAT model in INT8 mode (generate calibration cache file).
# !rm -rf $USER_EXPERIMENT_DIR/export
# !mkdir -p $USER_EXPERIMENT_DIR/export
# !yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt  \
#                 -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt \
#                 -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
#                 -k $KEY \
#                 --data_type int8 \
#                 --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin

In [None]:
print('Exported model:')
print('------------')
!ls -lh $USER_EXPERIMENT_DIR/export

Verify engine generation using the `tlt-converter` utility included with the docker.

The `tlt-converter` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please instantiate this docker and execute the `tlt-converter` command, with the exported `.etlt` file and calibration cache (for int8 mode) on your target device. The converter utility included in this docker only works for x86 devices, with discrete NVIDIA GPU's. 

For the jetson devices, please download the converter for jetson from the dev zone link [here](https://developer.nvidia.com/tlt-converter). 

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.etlt` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

---
### Source

This Notebook was adapted from examples within NVIDIA TLT/TAO Docker container pulled from ngc.nvidia.com

### Licensing 

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).