# Distributed Deep Learning Bootcamp

## Learning objectives

The objective of this bootcamp is to give an introduction to Distributed Deep Learning. This bootcamp will introduce participants to fundamentals of Distributed deep learning and give a hands-on experience on methods that can be applied to Deep learning models for faster model training.The Bootcamp assumes familiarity with Deep learning fundamentals and if you are new to Deep Learning , kindly go through the [AI for Climate Bootcamp](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc_ai/ai_science_climate) prior.

* Standard: Python
* Frameworks: Horovod , Tensorflow 

It is required to have more than one GPU for the bootcamp and we recommend using a [DGX](https://www.nvidia.com/en-in/data-center/dgx-systems/) like cluster with [NVLink / NVSwitch](https://www.nvidia.com/en-in/data-center/nvlink/) support.

Let's start with testing the GPUs you are running the code on in this bootcamp.

In [None]:
!nvidia-smi

### Tutorial Outline

The following contents will be covered during the Bootcamp :

- [**Introduction to Distributed deep learning**](jupyter_notebook/1.Introduction-to-Distributed-Deep-Learning.ipynb)
 - [The need for Distributed Deep Learning](jupyter_notebook/1.Introduction-to-Distributed-Deep-Learning.ipynb#The-need-for-Distributed-Deep-Learning)
 - [Differnet types of Distributed Deep learning and it's applications](jupyter_notebook/1.Introduction-to-Distributed-Deep-Learning.ipynb#Differnet-types-of-Distributed-Deep-learning-and-it's-applications)
 - [Training and Inference](jupyter_notebook/1.Introduction-to-Distributed-Deep-Learning.ipynb#Training-and-Inference)
 - [Data and Model Parallelism](jupyter_notebook/1.Introduction-to-Distributed-Deep-Learning.ipynb#Data-and-Model-Parallelism)
 - [Framework and NVIDIA NGC Support - Optional](jupyter_notebook/1.Introduction-to-Distributed-Deep-Learning.ipynb#Framework-and-NVIDIA-NGC-Support---Optional)
 - [Demo - Scalability across multiple GPUs](jupyter_notebook/1.Introduction-to-Distributed-Deep-Learning.ipynb#Demo---Scalability-across-multiple-GPUs) 


- [**System Topology**](jupyter_notebook/2.1.System-Topology.ipynb)
 - [Understanding System Topology](jupyter_notebook/2.1.System-Topology.ipynb#Understanding-System-Topology)
 - [Communication concepts](jupyter_notebook/2.1.System-Topology.ipynb#Communication-concepts)
 - [Intra-Node Communication Topology](jupyter_notebook/2.1.System-Topology.ipynb#Intra-Node-communication-Topology)
 - [Performance variation due to system topology](jupyter_notebook/2.1.System-Topology.ipynb#Performance-variation-due-to-system-topology)
 - [Profiling using DLProf](jupyter_notebook/2.1.System-Topology.ipynb#Profiling-using-DLProf)
 - [NCCL](jupyter_notebook/2.1.System-Topology.ipynb#NCCL)
 - [NCCL_P2P_LEVEL=0 or P2P Disabled](jupyter_notebook/2.1.System-Topology.ipynb#NCCL_P2P_LEVEL=0-or-P2P-Disabled)
 - [NCCL_P2P_LEVEL=1 or P2P via PCIe](jupyter_notebook/2.1.System-Topology.ipynb#NCCL_P2P_LEVEL=1-or-P2P-via-PCIe)
 - [Benchmarking the system topology](jupyter_notebook/2.1.System-Topology.ipynb#Benchmarking-the-system-topology)


- [**Hands-on with Distributed training**](jupyter_notebook/3.Hands-on-Multi-GPU.ipynb)
 - [Tensorflow - Keras](jupyter_notebook/3.Hands-on-Multi-GPU.ipynb#Tensorflow---Keras)
 - [Horovod](jupyter_notebook/3.Hands-on-Multi-GPU.ipynb#Horovod)



- [**Challenges with convergence**](jupyter_notebook/4.Convergence.ipynb)
 - [Concepts](jupyter_notebook/4.Convergence.ipynb#Concepts)
 - [Impact of Batch size](jupyter_notebook/4.Convergence.ipynb#Impact-of-Batch-size)
 - [Impact on test and validation accuracy](jupyter_notebook/4.Convergence.ipynb#Impact-on-test-and-validation-accuracy)
 - [Techniques for faster convergence](jupyter_notebook/4.Convergence.ipynb#Techniques-for-faster-convergence)
 - [Batch norm](jupyter_notebook/4.Convergence.ipynb#Batch-norm)
 - [Learning rate scaling](jupyter_notebook/4.Convergence.ipynb#Learning-rate-scaling)
 - [Learning rate warmup](jupyter_notebook/4.Convergence.ipynb#Learning-rate-warmup)
 - [Using Optimizers built for Exascale Deep learning](jupyter_notebook/4.Convergence.ipynb#Using-Optimizers-built-for-Exascale-Deep-learning)



### Tutorial Duration
The lab material will be presented in a 6hr session. Link to material is available for download at the end of the lab.

### Content Level
Beginner, Intermediate

### Target Audience and Prerequisites
The target audience for this lab is researchers/graduate students and developers who are interested in learning about scaling their Deep learning systems to multiple GPUs to accelerate their scientific applications.

Basic understanding on Deep learning is required, If you are new to Deep learning , it is recommended to go through the [AI for Climate Bootcamp](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc_ai/ai_science_climate) prior.
 
**Disclaimer** : All the results mentioned in the notebooks were tested on a *DGX-1 machine equipped with 8 x Tesla V100 connected via NVLink*. The results would vary when using different hardware and would also depend on the Interconnect bandwidth and the thermal conditions of the machine.

--- 

## Licensing

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).