Tosin Akinwale Adesuyi 281c4d7a15 adding TAO to experimentals directory %!s(int64=4) %!d(string=hai) anos
..
English 281c4d7a15 adding TAO to experimentals directory %!s(int64=4) %!d(string=hai) anos
Dockerfile 97b29ca47c adding TAO to experimentals directory %!s(int64=4) %!d(string=hai) anos
README.md 078716e6d4 adding TAO to experimentals directory %!s(int64=4) %!d(string=hai) anos
Singularity 97b29ca47c adding TAO to experimentals directory %!s(int64=4) %!d(string=hai) anos

README.md

NVIDIA TAO Toolkit

This folder contains contents for TLT learning bootcamp.

  • Transfer learning with NVIDIA TAO
  • Pretrained Model from NGC
  • Hands-on on Image classification that involes training, evaluation, pruning, retraining, inferencing, model export, and INT8 optimization

Prerequisites

To run this tutorial you will need a machine with NVIDIA GPU.

#Tutorial Duration The total bootcamp material would take approximately 2 hours.

Creating containers

To start with, you will have to build a Docker or Singularity container.

Docker Container

To build a docker container, run: sudo docker build -t <imagename>:<tagnumber> .

For instance: sudo docker build -t myimage:1.0 .

The code labs have been written using Jupyter-lab and a Dockerfile has been built to simplify deployment. In order to serve the docker instance for a student, it is necessary to expose port 8888 from the container, for instance, the following command would expose port 8888 inside the container as port 8888 on the lab machine:

sudo docker run --rm -it --gpus=all -p 8888:8888 -p 8000:8000 myimage:1.0 or sudo docker run --rm -it --runtime=nvidia -p 8888:8888 ai-tao:1.0

When this command is run, you can browse to the serving machine on port 8888 using any web browser to access the labs and port 8000 for dlprofviewer server. For instance, from if they are running on the local machine the web browser should be pointed to http://localhost:8888. The --gpus flag is used to enable all NVIDIA GPUs during container runtime. The --rm flag is used to clean an temporary images created during the running of the container. The -it flag enables killing the jupyter server with ctrl-c. This command may be customized for your hosting environment.

Once inside the container launch the jupyter notebook by typing the following command

jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace/tlt-experiments

Then, open the jupyter notebook in browser: http://localhost:8888 Start working on the lab by clicking on the Start_here.ipynb notebook.

Singularity Container

To build the singularity container, run: sudo singularity build --sandbox <image_name>.simg Singularity

For instance: sudo singularity build --sandbox myimage.simg Singularity

and copy the files to your local machine to make sure changes are stored locally:

singularity run --writable <image_name>.simg cp -rT /workspace/tlt-experiments/ ~/workspace

Then, run the container:

singularity run --nv --writable <image_name>.simg jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace/tlt-experiments

Then, open the jupyter notebook in browser: http://localhost:8888 Start working on the lab by clicking on the Start_Here.ipynb notebook.

Known Issues

Q. "ResourceExhaustedError" error is observed while running the labs A. Currently the batch size and network model is set to consume 16GB GPU memory. In order to use the labs without any modifications it is recommended to have GPU with minimum 16GB GPU memory. Else the users can play with batch size to reduce the memory footprint

  • Please go through the list of exisiting bugs/issues or file a new issue at Github.