il y a 5 ans · 7dacd45d7a
--- a/hpc/multi_gpu_nways/README.md
+++ b/hpc/multi_gpu_nways/README.md
@@ -18,31 +18,65 @@ Scaling applications to multiple GPUs across multiple nodes requires one to be a
 
				   - Supplemental: Configuring MPI in a containerized environment
			
 
				 * NVIDIA Collectives Communications Library (NCCL)
			
 
				 * NVHSMEM Library
			
 
				-* Final remarks
			
 
				-
			
 
				-**NOTE:** NCCL, NVSHMEM, and Final Remarks notebooks are work under progress. All other notebooks are available.
			
 
				 
			
 
				 ## Prerequisites
			
 
				 
			
 
				 This bootcamp requires a multi-node system with multiple GPUs in each node (atleast 2 GPUs/ node). 
			
 
				 
			
 
				-A multi-node installation of [NVIDIA's HPC SDK](https://developer.nvidia.com/hpc-sdk) is desired. 
			
 
				+### Using NVIDIA HPC SDK
			
 
				+
			
 
				+A multi-node installation of [NVIDIA's HPC SDK](https://developer.nvidia.com/hpc-sdk) is desired. Refer to [NVIDIA HPC SDK Installation Guide](https://docs.nvidia.com/hpc-sdk/hpc-sdk-install-guide/index.html) for detailed instructions. Ensure that your installation contains HPCX with UCX. 
			
 
				+
			
 
				+After installation, make sure to add HPC SDK to the environment as follows:
			
 
				+
			
 
				+```bash
			
 
				+# Add HPC-SDK to PATH:
			
 
				+export PATH="<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/compilers/bin:<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/cuda/bin:$PATH"
			
 
				+# Add HPC-SDK to LD_LIBRARY_PATH:
			
 
				+export LD_LIBRARY_PATH="<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/comm_libs/nvshmem/lib:<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/comm_libs/nccl/lib:<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/comm_libs/mpi/lib:<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/math_libs/lib64:<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/compilers/lib:<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/cuda/extras/CUPTI/lib64:<path-nvidia-hpc-sdk>>/Linux_x86_64/21.5/cuda/lib64:$LD_LIBRARY_PATH"
			
 
				+```
			
 
				+**Note:** If you don't use Slurm workload manager, remove `--with-slurm` flag.
			
 
				+
			
 
				+Then, install OpenMPI as follows:
			
 
				+
			
 
				+```bash
			
 
				+# Download and extract OpenMPI Tarfile
			
 
				+wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.1.tar.gz
			
 
				+tar -xvzf openmpi-4.1.1.tar.gz
			
 
				+cd openmpi-4.1.1/
			
 
				+mkdir -p build
			
 
				+# Configure OpenMPI
			
 
				+./configure --prefix=$PWD/build --with-libevent=internal --with-xpmem --with-cuda=<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/cuda/ --with-slurm --enable-mpi1-compatibility --with-verbs --with-hcoll=<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/comm_libs/hpcx/hpcx-2.8.1/hcoll/ --with-ucx=<path-to-nvidia-hpc-sdk>/Linux_x86_64/21.5/comm_libs/hpcx/hpcx-2.8.1/ucx/
			
 
				+# Install OpenMPI
			
 
				+make all install
			
 
				+```
			
 
				+
			
 
				+Now, add OpenMPI to the environment:
			
 
				+
			
 
				+```bash
			
 
				+export PATH="<path-to-openmpi>/build/bin/:$PATH"
			
 
				+export LD_LIBRARY_PATH="<path-to-openmpi/build/lib:$LD_LIBRARY_PATH"
			
 
				+```
			
 
				+
			
 
				+Ensure that the custom-built OpenMPI is in use by running `which mpirun` which should point the `mpirun` binary in `<path-to-openmpi>/build/bin` directory.
			
 
				 
			
 
				-Otherwise, multi-node compatible versions of the following are required:
			
 
				+### Without Using NVIDIA HPC SDK
			
 
				+
			
 
				+Multi-node compatible versions of the following are required:
			
 
				 
			
 
				-* [Singularity](https://sylabs.io/docs/%5D)
			
 
				 * [OpenMPI](https://www.open-mpi.org/)
			
 
				 * [HPCX](https://developer.nvidia.com/networking/hpc-x)
			
 
				 * [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)
			
 
				 * [NCCL](https://developer.nvidia.com/nccl)
			
 
				+* [NVSHMEM](https://developer.nvidia.com/nvshmem)
			
 
				 
			
 
				 ## Testing
			
 
				 
			
 
				-We have tested all the codes with CUDA drivers 460.32.03 with CUDA 11.3.0.0, OpenMPI 4.1.1, HPCX 2.8.1, Singularity 3.6.1, and NCCL 2.9.9.1. 
			
 
				+We have tested all the codes with CUDA drivers 460.32.03 with CUDA 11.3.0.0, OpenMPI 4.1.1, HPCX 2.8.1, Singularity 3.6.1, NCCL 2.9.9.1, and NVSHMEM 2.1.2. Note that OpenMPI in our cluster was compiled with CUDA, HCOLL, and UCX support.
			
 
				 
			
 
				-### Containerless Build
			
 
				+## Running Jupyter Lab
			
 
				 
			
 
				-As this bootcamp covers multi-node CUDA-aware MPI concepts, it is primarily designed to run without any containers. After the prerequisite softwares have been installed, follow these steps to install Jupyter Lab:
			
 
				+As this bootcamp covers multi-node CUDA-aware MPI concepts, it is primarily designed to run without any containers. After the prerequisite softwares have been installed, follow these steps to install and run Jupyter Lab:
			
 
				 
			
 
				 ```bash
			
 
				 # Install Anaconda3
			
@@ -52,35 +86,28 @@ bash Miniconda3-latest-Linux-x86_64.sh -b -p <my_dir>
 
				 export PATH=$PATH:<my_dir>/bin/
			
 
				 # Install Jupyter Lab
			
 
				 conda install -c conda-forge jupyterlab
			
 
				+# Run Jupyter Lab
			
 
				+jupyter lab --notebook-dir=<path-to-gpubootcamp-repo>/hpc/multi_gpu_nways/labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token=""
			
 
				 ```
			
 
				 
			
 
				-### Containerized Build:
			
 
				-
			
 
				-If containerization is desired, follow the steps outlined in the notebook [MPI in Containerized Environments](labs/CFD/English/C/jupyter_notebook/mpi/containers_and_mpi.ipynb).
			
 
				-
			
 
				-#### Building Singularity Container
			
 
				-
			
 
				-```bash
			
 
				-singularity build multi_gpu_nways.simg Singularity
			
 
				-```
			
 
				+After running Jupyter Lab, open [http://localhost:8888](http://localhost:8888/) in a web browser and start the `introduction.ipynb` notebook.
			
 
				 
			
 
				-### Running Jupyter Lab
			
 
				+### Containerized Build with Singularity
			
 
				 
			
 
				-#### Containerless Build
			
 
				+**Note:** This material is designed to primarily run in containerless environments, that is, directly on the cluster. Thus, building the Singularity container is OPTIONAL.
			
 
				 
			
 
				-```bash
			
 
				-jupyter lab --notebook-dir=./labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token=""
			
 
				-```
			
 
				+If containerization is desired, follow the steps outlined in the notebook [MPI in Containerized Environments](labs/CFD/English/C/jupyter_notebook/mpi/containers_and_mpi.ipynb).
			
 
				 
			
 
				-#### Containerized Build
			
 
				+Follow the steps below to build the Singularity container image and run Jupyter Lab:
			
 
				 
			
 
				 ```bash
			
 
				-singularity run --nv multi_gpu_nways.simg jupyter lab --notebook-dir=./labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token="" 
			
 
				+# Build the container
			
 
				+singularity build multi_gpu_nways.simg Singularity
			
 
				+# Run Jupyter Lab
			
 
				+singularity run --nv multi_gpu_nways.simg jupyter lab --notebook-dir=<path-to-gpubootcamp-repo>/hpc/multi_gpu_nways/labs/ --port=8000 --ip=0.0.0.0 --no-browser --NotebookApp.token="" 
			
 
				 ```
			
 
				 
			
 
				-### Accessing Bootcamp
			
 
				-
			
 
				-After running Jupyter Lab, open [http://localhost:8888](http://localhost:8888/) in a web browser and start the `introduction.ipynb` notebook.
			
 
				+Then, access Jupyter Lab on [http://localhost:8888](http://localhost:8888/).
			
 
				 
			
 
				 ## Questions?
			
 
				 
			
--- a/hpc/multi_gpu_nways/Singularity
+++ b/hpc/multi_gpu_nways/Singularity
@@ -61,4 +61,4 @@ FROM: nvcr.io/nvidia/nvhpc:21.5-devel-cuda_multi-ubuntu20.04
 
				     "$@"
			
 
				 
			
 
				 %labels
			
 
				-    AUTHOR mozhgank
			
 
				+    AUTHOR Anish-Saxena