Anish Saxena c16efe6ea9 Made slurm_pmi_config/lib directory visible in repo преди 4 години
..
labs 24eda865ed Simplified p2pBandwidthLatencyTest, removed Dockerfile, updated README преди 4 години
slurm_pmi_config c16efe6ea9 Made slurm_pmi_config/lib directory visible in repo преди 4 години
README.md 24eda865ed Simplified p2pBandwidthLatencyTest, removed Dockerfile, updated README преди 4 години
Singularity 24eda865ed Simplified p2pBandwidthLatencyTest, removed Dockerfile, updated README преди 4 години
mgpm 15c80111fa Added multi-node multi-GPU bootcamp code and notebooks преди 4 години

README.md

N-Ways to Multi-GPU Programming

This bootcamp focuses on multi-GPU programming models.

Scaling applications to multiple GPUs across multiple nodes requires one to be adept at not just the programming models and optimization techniques, but also at performing root-cause analysis using in-depth profiling to identify and minimize bottlenecks. In this bootcamp, participants will learn to improve the performance of an application step-by-step, taking cues from profilers along the way. Moreover, understanding of the underlying technologies and communication topology will help us utilize high-performance NVIDIA libraries to extract more performance out of the system.

Bootcamp Outline

  • Overview of single-GPU code and Nsight Systems Profiler
  • Single Node Multi-GPU:
    • CUDA Memcpy and Peer-to-Peer Memory Access
    • Intra-node topology
    • CUDA Streams and Events
  • Multi-Node Multi-GPU:
    • Introduction to MPI and Multi-Node execution overview
    • MPI with CUDA Memcpy
    • CUDA-aware MPI
    • Supplemental: Configuring MPI in a containerized environment
  • NVIDIA Collectives Communications Library (NCCL)
  • NVHSMEM Library
  • Final remarks

NOTE: NCCL, NVSHMEM, and Final Remarks notebooks are work under progress. All other notebooks are available.

Prerequisites

This bootcamp requires a multi-node system with multiple GPUs in each node (atleast 2 GPUs/ node).

A multi-node installation of NVIDIA's HPC SDK is desired.

Otherwise, multi-node compatible versions of the following are required:

Testing

We have tested all the codes with CUDA drivers 460.32.03 with CUDA 11.3.0.0, OpenMPI 4.1.1, HPCX 2.8.1, Singularity 3.6.1, and NCCL 2.9.9.1.

Containerless Build

As this bootcamp covers multi-node CUDA-aware MPI concepts, it is primarily designed to run without any containers. After the prerequisite softwares have been installed, follow these steps to install Jupyter Lab:

# Install Anaconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 
bash Miniconda3-latest-Linux-x86_64.sh -b -p <my_dir>
# Add conda to PATH
export PATH=$PATH:<my_dir>/bin/
# Install Jupyter Lab
conda install -c conda-forge jupyterlab

Containerized Build:

If containerization is desired, follow the steps outlined in the notebook MPI in Containerized Environments.

Building Singularity Container

singularity build multi_gpu_nways.simg Singularity

Running Jupyter Lab

Containerless Build

jupyter lab --notebook-dir=./labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token=""

Containerized Build

singularity run --nv multi_gpu_nways.simg jupyter lab --notebook-dir=./labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token="" 

Accessing Bootcamp

After running Jupyter Lab, open http://localhost:8888 in a web browser and start the introduction.ipynb notebook.

Questions?

Please join OpenACC Slack Channel to raise questions.

If you observe any errors or issues, please file an issue on GPUBootcamp GitHuB repository.