|
vor 3 Jahren | |
---|---|---|
.. | ||
labs | vor 3 Jahren | |
slurm_pmi_config | vor 3 Jahren | |
README.md | vor 3 Jahren | |
Singularity | vor 3 Jahren | |
mgpm | vor 3 Jahren |
This bootcamp focuses on multi-GPU programming models.
Scaling applications to multiple GPUs across multiple nodes requires one to be adept at not just the programming models and optimization techniques, but also at performing root-cause analysis using in-depth profiling to identify and minimize bottlenecks. In this bootcamp, participants will learn to improve the performance of an application step-by-step, taking cues from profilers along the way. Moreover, understanding of the underlying technologies and communication topology will help us utilize high-performance NVIDIA libraries to extract more performance out of the system.
NOTE: NCCL, NVSHMEM, and Final Remarks notebooks are work under progress. All other notebooks are available.
This bootcamp requires a multi-node system with multiple GPUs in each node (atleast 2 GPUs/ node).
A multi-node installation of NVIDIA's HPC SDK is desired.
Otherwise, multi-node compatible versions of the following are required:
We have tested all the codes with CUDA drivers 460.32.03 with CUDA 11.3.0.0, OpenMPI 4.1.1, HPCX 2.8.1, Singularity 3.6.1, and NCCL 2.9.9.1.
As this bootcamp covers multi-node CUDA-aware MPI concepts, it is primarily designed to run without any containers. After the prerequisite softwares have been installed, follow these steps to install Jupyter Lab:
# Install Anaconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p <my_dir>
# Add conda to PATH
export PATH=$PATH:<my_dir>/bin/
# Install Jupyter Lab
conda install -c conda-forge jupyterlab
If containerization is desired, follow the steps outlined in the notebook MPI in Containerized Environments.
singularity build multi_gpu_nways.simg Singularity
jupyter lab --notebook-dir=./labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token=""
singularity run --nv multi_gpu_nways.simg jupyter lab --notebook-dir=./labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token=""
After running Jupyter Lab, open http://localhost:8888 in a web browser and start the introduction.ipynb
notebook.
Please join OpenACC Slack Channel to raise questions.
If you observe any errors or issues, please file an issue on GPUBootcamp GitHuB repository.