# TLT Image Classification 
---
## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TLT to:

* Take a pretrained resnet18 model and finetune on a sample dataset converted from PascalVOC
* Prune the finetuned model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Run Inference on the trained model
* Export the pruned and retrained model to a .etlt file for deployment to DeepStream

### Table of Contents
This notebook shows an example use case for classification using the Transfer Learning Toolkit.

0. [Set up env variables](#head-0)
1. [Prepare dataset and pretrained model](#head-1)
    1. [Split the dataset into train/test/val](#head-1-1)
    2. [Download pre-trained model](#head-1-2)
2. [Provide training specfication](#head-2)
3. [Run TLT training](#head-3)
4. [Evaluate trained models](#head-4)
5. [Prune trained models](#head-5)
6. [Retrain pruned models](#head-6)
7. [Testing the model](#head-7)
8. [Visualize inferences](#head-8)
9. [Export and Deploy!](#head-9)
    1. [Int8 Optimization](#head-9-1)
    2. [Generate TensorRT engine](#head-9-2)

# Transfer Learning with TLT

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Transfer Learning Toolkit (TLT) is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/embedded-transfer-learning-toolkit-software-stack-1200x670px.png" width="720"> 

Before TLT can be use, you need to register at ngc.nvidia.com and proceed to generate an API Key. A step-by-step process to achieving this is given below:
- From your browser visit `ngc.nvidia.com`
- Click on `Welcome Guest` and you would see a dropdown menu and then click on `Sign In/Sign Up`.  
- Click on `continue` button where `NVIDIA Account (use existing or create a new NVIDIA ac-)` is written.
- Click on `Create account` and get registered. Thereafter you may proceed to login with your new account credentials.
- At the top right corner, click on your `username`, you would see a dropdown menu, then click on `Setup`.
- proceed and click on `Get API Key` button.
- Next, you would find at the top right corner a `Generate API Key` button, click on this button. A dialog box would appear after the click, you must click on the `confirm` button on it.
- Finally, copy your generated API Key and Username, and save it somewhere on your local system.

<img align="center" src="images/ngc_setup_key.PNG" width="600"> 
<img align="center" src="images/ngc_key.PNG" width="700">

## API Key

- Your API key represents your credentials
  - Used for programmatic interaction (e.g., docker, REST API, etc.)
  - Uniquely identifies you (think “Username & Password”)
  - There can be only one (regenerating your API key invalidates the old one)
- Programmatic interface at `nvcr.io`: Use API Key

## 0. Setup env variables <a class="anchor" id="head-0"></a>

Please copy your API Key from where you saved it and paste it within the empty single quote in front of `%env KEY=''`.

In [None]:
%env USER_EXPERIMENT_DIR=/workspace/tlt-experiments/classification
%env DATA_DOWNLOAD_DIR=/workspace/tlt-experiments/data
#%env SPECS_DIR=/workspace/tlt-experiments/classification/specs
%env SPECS_DIR=/workspace/tlt-experiments/specs
%env KEY='place your ngc api key here'


## 1. Prepare datasets and pre-trained model <a class="anchor" id="head-1"></a>

We will be using the pascal VOC dataset for the tutorial. To find more details please visit 
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit. if you intend to run this notebook on your local workstation without using a container, Please download the dataset present at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar to $DATA_DOWNLOAD_DIR or `workspace/tlt-experiments/data`.

In [None]:
# Check that file is present
import os
DATA_DIR = os.environ.get('DATA_DOWNLOAD_DIR')
if not os.path.isfile(os.path.join(DATA_DIR , 'VOCtrainval_11-May-2012.tar')):
    print('tar file for dataset not found. Please download.')
else:
    print('Found dataset.')

In [None]:
# unpack 
!tar -xvf $DATA_DOWNLOAD_DIR/VOCtrainval_11-May-2012.tar -C $DATA_DOWNLOAD_DIR 

In [None]:
# verify
!ls $DATA_DOWNLOAD_DIR/VOCdevkit/VOC2012

### A. Split the dataset into train/val/test <a class="anchor" id="head-1-1"></a>

Pascal VOC Dataset is converted to our format (for classification) and then to train/val/test in the next two blocks.

In [None]:
from os.path import join as join_path
import os
import glob
import re
import shutil

DATA_DIR=os.environ.get('DATA_DOWNLOAD_DIR')
source_dir = join_path(DATA_DIR, "VOCdevkit/VOC2012")
target_dir = join_path(DATA_DIR, "formatted")


suffix = '_trainval.txt'
classes_dir = join_path(source_dir, "ImageSets", "Main")
images_dir = join_path(source_dir, "JPEGImages")
classes_files = glob.glob(classes_dir+"/*"+suffix)
for file in classes_files:
    # get the filename and make output class folder
    classname = os.path.basename(file)
    if classname.endswith(suffix):
        classname = classname[:-len(suffix)]
        target_dir_path = join_path(target_dir, classname)
        if not os.path.exists(target_dir_path):
            os.makedirs(target_dir_path)
    else:
        continue
    print(classname)


    with open(file) as f:
        content = f.readlines()


    for line in content:
        tokens = re.split('\s+', line)
        if tokens[1] == '1':
            # copy this image into target dir_path
            target_file_path = join_path(target_dir_path, tokens[0] + '.jpg')
            src_file_path = join_path(images_dir, tokens[0] + '.jpg')
            shutil.copyfile(src_file_path, target_file_path)

In [None]:
import os
import glob
import shutil
from random import shuffle
from tqdm import tqdm_notebook as tqdm

DATA_DIR=os.environ.get('DATA_DOWNLOAD_DIR')
SOURCE_DIR=join_path(DATA_DIR, 'formatted')
TARGET_DIR=os.path.join(DATA_DIR,'split')
# list dir
dir_list = next(os.walk(SOURCE_DIR))[1]
# for each dir, create a new dir in split
for dir_i in tqdm(dir_list):
        newdir_train = os.path.join(TARGET_DIR, 'train', dir_i)
        newdir_val = os.path.join(TARGET_DIR, 'val', dir_i)
        newdir_test = os.path.join(TARGET_DIR, 'test', dir_i)
        
        if not os.path.exists(newdir_train):
                os.makedirs(newdir_train)
        if not os.path.exists(newdir_val):
                os.makedirs(newdir_val)
        if not os.path.exists(newdir_test):
                os.makedirs(newdir_test)

        img_list = glob.glob(os.path.join(SOURCE_DIR, dir_i, '*.jpg'))
        # shuffle data
        shuffle(img_list)

        for j in range(int(len(img_list)*0.7)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'train', dir_i))

        for j in range(int(len(img_list)*0.7), int(len(img_list)*0.8)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'val', dir_i))
                
        for j in range(int(len(img_list)*0.8), len(img_list)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'test', dir_i))
                
print('Done splitting dataset.')

In [None]:
!ls $DATA_DOWNLOAD_DIR/split/test/cat

### B. Download pretrained models <a class="anchor" id="head-1-2"></a>

- View list of classification domain pretrained models

In [None]:
!ngc registry model list nvidia/tlt_pretrained_classification:*

- Create a folder named `pretrained_resnet18` where resnet18 model pulled from NGC would be stored

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/pretrained_resnet18/

-  Pull resnet18 pretrained model from NGC

In [None]:
!ngc registry model download-version nvidia/tlt_pretrained_classification:resnet18 --dest $USER_EXPERIMENT_DIR/pretrained_resnet18

- Check that model is downloaded into directory

In [None]:
!ls -l $USER_EXPERIMENT_DIR/pretrained_resnet18/tlt_pretrained_classification_vresnet18

## 2. Provide training specfication <a class="anchor" id="head-2"></a>
* Training dataset
* Validation dataset
* Pre-trained models
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

- Run the cell below to view the model spec configuration file. **Your task would be to modify the hyper-parameters to achieve desirable accuracy result**. You can access the `classification_spec.cfg` file in the `spec folder` seen at the top left-side of the jupyter lab. Please, remember to save the file with `ctl s` after modification and then rerun the cell below to see if your changes have reflected.

In [None]:
!cat $SPECS_DIR/classification_spec.cfg

## 3. Run TLT training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models.
- Run the cell below to train on a **single GPU**.  
- Please note some parameter definition: 
     - -e: `spec file`; -k: `API key encoding`;  -r: `result directory`; --gpu_index: `index of GPU`; --init_epoch: `epoch number`

In [None]:
!classification train -e $SPECS_DIR/classification_spec.cfg -r $USER_EXPERIMENT_DIR/output -k $KEY

- To run this training using **multiple GPUs**, please uncomment the cell below and update the `--gpus` parameter to the number of GPU's you wish to use. However, you are restricted to maximum of `2 GPUs` per teams on the cluster.

In [None]:
#!classification train -e $SPECS_DIR/classification_spec.cfg \
#                       -r $USER_EXPERIMENT_DIR/output \
#                       -k $KEY --gpus 2

- To resume from a **checkpoint**, use `--init_epoch` along with your checkpoint configured in the spec file.
- Please make sure that the `model_path` in the spec file is now updated to the `.tlt` file of the corresponding
   epoch you wish to resume from. You may choose from the files found under, `$USER_EXPERIMENT_DIR/output/weights` folder.

In [None]:
# !classification train -e $SPECS_DIR/classification_spec.cfg \
#                        -r $USER_EXPERIMENT_DIR/output \
#                        -k $KEY --gpus 2 \
#                        --init_epoch N

## 4. Evaluate trained models <a class="anchor" id="head-4"></a>

In this step, we assume that the training is complete and the model from the final epoch (`resnet_080.tlt`) is available. If you would like to run evaluation on an earlier model, please edit the spec file at `$SPECS_DIR/classification_spec.cfg` to point to the intended model.

In [None]:
!classification evaluate -e $SPECS_DIR/classification_spec.cfg -k $KEY

## 5. Prune trained models <a class="anchor" id="head-5"></a>
* Specify pre-trained model
* Equalization criterion
* Threshold for pruning
* Exclude prediction layer that you don't want pruned (e.g. predictions)

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is depend on the dataset. A pth value 0.68 is just a starting point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
# Defining the checkpoint epoch number of the model to be used for the pruning.
# This should be lesser than the number of epochs training has been run for, incase training was interrupted earlier.
# By default, the default final model is at epoch 080.
%env EPOCH=080
!mkdir -p $USER_EXPERIMENT_DIR/output/resnet_pruned
!classification prune -m $USER_EXPERIMENT_DIR/output/weights/resnet_$EPOCH.tlt \
           -o $USER_EXPERIMENT_DIR/output/resnet_pruned/resnet18_nopool_bn_pruned.tlt \
           -eq union \
           -pth 0.6 \
           -k $KEY

In [None]:
print('Pruned model:')
print('------------')
!ls -r1t $USER_EXPERIMENT_DIR/output/resnet_pruned

## 6. Retrain pruned models <a class="anchor" id="head-6"></a>
* Model needs to be re-trained to bring back accuracy after pruning
- Run the cell below to view the retrain spec configuration file. Your task would be to modify the hyper-parameters to achieve desirable accuracy result. You can access the `classification_retrain_spec.cfg` file in the `specs folder` seen at the top left-side of the jupyter lab. Please, remember to save the file with `ctl s` after modification and then rerun the cell below to see if your changes have reflected.

In [None]:
!cat $SPECS_DIR/classification_retrain_spec.cfg

In [None]:
!classification train -e $SPECS_DIR/classification_retrain_spec.cfg \
                      -r $USER_EXPERIMENT_DIR/output_retrain \
                      -k $KEY

## 7. Testing the model! <a class="anchor" id="head-7"></a>

In this step, we assume that the training is complete and the model from the final epoch (`resnet_080.tlt`) is available. If you would like to run evaluation on an earlier model, please edit the spec file at `$SPECS_DIR/classification_retrain_spec.cfg` to point to the intended model.

In [None]:
!classification evaluate -e $SPECS_DIR/classification_retrain_spec.cfg -k $KEY

## 8. Visualize Inferences <a class="anchor" id="head-8"></a>

To see the output results of our model on test images, we can use the `tlt-infer` tool. Note that using models trained for higher epochs will usually result in better results. First we'll run inference in single image mode.

In [None]:
# Choosing a random test image from the test set.
import os
import random

test_dataset = os.path.join(os.environ.get('DATA_DOWNLOAD_DIR'), 'split', 'test')
classes = [item for item in os.listdir(test_dataset) if os.path.isdir(os.path.join(test_dataset,item))]
class_under_test = random.choice(classes)
test_image_dir = os.path.join(test_dataset, class_under_test)
image_list = [os.path.join(test_image_dir, item) for item in os.listdir(test_image_dir)
              if item.endswith('.jpg')]
os.environ['TEST_IMAGE'] = random.choice(image_list)

print("Input image is from class: {}".format(class_under_test))
print("Image path is: {}".format(os.environ['TEST_IMAGE']))

- Defining the checkpoint epoch number to use for the subsequent steps. This should be lesser than the number of epochs training has been run for, incase training was interrupted earlier. By default, the default final model is at epoch 080.
- Please note some parameter definition:
  - -m:`retrained model;` -e:`retrain spec file;` -cm: `classmap;` -k: `encoding key;` -b: `batch size;` -d: `test data dir`  

In [None]:
%env EPOCH=080

In [None]:
!classification inference -e $SPECS_DIR/classification_retrain_spec.cfg \
                          -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
                          -k $KEY -b 32 -i $TEST_IMAGE \
                          -cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

We can also run inference in directory mode to run on a set of test images. 

In [None]:
!classification inference -e $SPECS_DIR/classification_retrain_spec.cfg \
                          -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
                          -k $KEY -b 32 -d $DATA_DOWNLOAD_DIR/split/test/person \
                          -cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

This above cell also outputs a `results.csv` file in the same directory. We can use a simple python program in the cell below to see and the visualize the output of csv the file.

In [None]:
import matplotlib.pyplot as plt
from PIL import Image 
import os
import csv
from math import ceil

DATA_DIR = os.environ.get('DATA_DOWNLOAD_DIR')
csv_path = os.path.join(DATA_DIR, 'split', 'test', 'person', 'result.csv')
results = []
with open(csv_path) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        results.append((row[0], row[1]))

w,h = 200,200
fig = plt.figure(figsize=(30,30))
columns = 5
rows = 1
for i in range(1, columns*rows + 1):
    ax = fig.add_subplot(rows, columns,i)
    img = Image.open(results[i][0])
    img = img.resize((w,h), Image.ANTIALIAS)
    plt.imshow(img)
    ax.set_title(results[i][1], fontsize=40)

## 9. Export and Deploy! <a class="anchor" id="head-9"></a>

In [None]:
!classification export \
            -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
            -o $USER_EXPERIMENT_DIR/export/final_model.etlt \
            -k $KEY

In [None]:
print('Exported model:')
print('------------')
!ls -lh $USER_EXPERIMENT_DIR/export/

### A. Int8 Optimization <a class="anchor" id="head-9-1"></a>
Classification model supports int8 optimization for inference in TRT. Inorder to use this, we must calibrate the model to run 8-bit inferences. This involves 2 steps

* Generate calibration tensorfile from the training data using tlt-int8-tensorfile
* Use tlt-export to generate int8 calibration table.

*Note: For this example, we generate a calibration tensorfile containing 10 batches of training data.
Ideally, it is best to use atleast 10-20% of the training data to calibrate the model.*

In [None]:
!tlt-int8-tensorfile classification -e $SPECS_DIR/classification_retrain_spec.cfg \
                                    -m 10 \
                                    -o $USER_EXPERIMENT_DIR/export/calibration.tensor

In [None]:
# Remove the pre-existing exported .etlt file.
!rm -rf $USER_EXPERIMENT_DIR/export/final_model.etlt
!classification export \
            -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
            -o $USER_EXPERIMENT_DIR/export/final_model.etlt \
            -k $KEY \
            --cal_data_file $USER_EXPERIMENT_DIR/export/calibration.tensor \
            --data_type int8 \
            --batches 10 \
            --cal_cache_file $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
            -v 

---
### Source

This Notebook was adapted from examples within NVIDIA TLT/TAO Docker container pulled from ngc.nvidia.com

### Licensing 

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).