Bläddra i källkod

Merge pull request #20 from bharatk-parallel/nways_md_fortran

Fortran N-Ways
Bharatkumar Sharma 2 år sedan
förälder
incheckning
3116c6d639
100 ändrade filer med 1075 tillägg och 24 borttagningar
  1. 7 5
      hpc/nways/Dockerfile
  2. 1 1
      hpc/nways/README.md
  3. 5 3
      hpc/nways/Singularity
  4. 1 1
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/cudac/nways_cuda.ipynb
  5. 7 7
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openacc/nways_openacc.ipynb
  6. 3 3
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp.ipynb
  7. 1 1
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/serial/rdf_overview.ipynb
  8. 2 2
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/stdpar/nways_stdpar.ipynb
  9. 1 1
      hpc/nways/nways_labs/nways_MD/English/C/source_code/serial/Makefile
  10. 115 0
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/Final_Remarks.ipynb
  11. 50 0
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/GPU_Architecture_Terminologies.ipynb
  12. 522 0
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/cudafortran/nways_cuda.ipynb
  13. 351 0
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/doconcurrent/nways_doconcurrent.ipynb
  14. 9 0
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/fidentify.log
  15. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/Nsight Diagram.png
  16. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/Optimization_Cycle.jpg
  17. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/UM.png
  18. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/allsection-compute.png
  19. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/baseline-compute.png
  20. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/charts-compute.png
  21. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cli-out.png
  22. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/coalesced_mem.png
  23. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/collapse_feedback.png
  24. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/collapse_pre.png
  25. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/collapse_thread.png
  26. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute.png
  27. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_analyz.png
  28. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_command.png
  29. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_command_line.png
  30. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_open.png
  31. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cpu.png
  32. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda.png
  33. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_hw_sw.png
  34. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_indexing.png
  35. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_profile.png
  36. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_profile_api.png
  37. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_profile_timeline.jpg
  38. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_vec_add.png
  39. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_vec_add2.png
  40. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/data_feedback.png
  41. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/data_thread.png
  42. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/diagram.png
  43. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/do_concurrent_gpu.jpg
  44. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/do_concurrent_multicore.jpg
  45. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_128.png
  46. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_256.png
  47. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_32.png
  48. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_vector.png
  49. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gpu_feedback.png
  50. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kernel_feedback.png
  51. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kernel_indep_feedback.png
  52. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kokkos_abstraction.png
  53. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kokkos_ecosystem.png
  54. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kokkos_mirror_view.png
  55. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/laplas3.png
  56. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/launch-compute.png
  57. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nsight_open.png
  58. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx.PNG
  59. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx_gpu.png
  60. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx_multicore.jpg
  61. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx_serial.jpg
  62. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc correlation.jpg
  63. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_3_directives.png
  64. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_construct.jpg
  65. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_copyclause.png
  66. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_multicore_feedback.png
  67. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_parallel.png
  68. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_parallel2.png
  69. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_parallel_loop.png
  70. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_feedback.png
  71. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_feedback_collapse.png
  72. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_feedback_multicore.png
  73. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_fork_join.png
  74. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_gpu.png
  75. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_gpu_collapse.png
  76. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_multicore.png
  77. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_parallel_construct.png
  78. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_parallelfor_construct.png
  79. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_target_distribute.png
  80. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_target_teams.png
  81. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_teams.png
  82. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_teams_for.png
  83. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/page-compute.png
  84. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel1f.png
  85. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel2f.png
  86. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel3f.png
  87. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_data.jpg
  88. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_data_feedback.png
  89. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_detailed.png
  90. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_expand.jpg
  91. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_loop.png
  92. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_timeline.jpg
  93. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_unified.jpg
  94. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/rdf.png
  95. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/roofline_collapse.png
  96. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/rule-compute.png
  97. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/scheduler_collapse.png
  98. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/serial.jpg
  99. BIN
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/sol.png
  100. 0 0
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/sol_baseline.png

+ 7 - 5
hpc/nways/Dockerfile

@@ -1,10 +1,11 @@
 # Copyright (c) 2021 NVIDIA Corporation.  All rights reserved. 
 
-# To build the docker container, run: $ sudo docker build -t openacc-labs:latest .
-# To run: $ sudo docker run --rm -it --runtime nvidia -p 8888:8888 openacc-labs:latest
+# To build the docker container, run: $ sudo docker build -t nways-labs:latest .
+# To run: $ sudo docker run --rm -it --runtime nvidia -p 8888:8888 nways-labs:latest
 # Finally, open http://localhost:8888/
 
-FROM nvcr.io/nvidia/nvhpc:20.11-devel-cuda_multi-ubuntu20.04
+#FROM nvcr.io/nvidia/nvhpc:20.11-devel-cuda_multi-ubuntu20.04
+FROM nvcr.io/nvidia/nvhpc:21.3-devel-cuda_multi-ubuntu20.04
 
 RUN apt-get -y update && \
         DEBIAN_FRONTEND=noninteractive apt-get -yq install --no-install-recommends python3-pip python3-setuptools nginx zip make build-essential libtbb-dev && \
@@ -31,10 +32,11 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ns
 COPY nways_labs/ /labs/
 
 RUN python3 /labs/nways_MD/English/C/source_code/dataset.py
+RUN python3 /labs/nways_MD/English/Fortran/source_code/dataset.py
 
 #################################################
-ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/lib64/"
-ENV PATH="/opt/nvidia/nsight-systems/2020.5.1/bin:/opt/nvidia/nsight-compute/2020.2.1:/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/include:/usr/local/bin:/opt/anaconda3/bin:/usr/bin:$PATH"
+ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.3/lib64/"
+ENV PATH="/opt/nvidia/nsight-systems/2020.5.1/bin:/opt/nvidia/nsight-compute/2020.2.1:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include:/usr/local/bin:/opt/anaconda3/bin:/usr/bin:$PATH"
 #################################################
 
 ADD nways_labs/ /labs

Filskillnaden har hållts tillbaka eftersom den är för stor
+ 1 - 1
hpc/nways/README.md


+ 5 - 3
hpc/nways/Singularity

@@ -1,13 +1,14 @@
 # Copyright (c) 2021 NVIDIA Corporation.  All rights reserved. 
 
 Bootstrap: docker
-FROM: nvcr.io/nvidia/nvhpc:20.11-devel-cuda_multi-ubuntu20.04
+#FROM: nvcr.io/nvidia/nvhpc:20.11-devel-cuda_multi-ubuntu20.04
+FROM: nvcr.io/nvidia/nvhpc:21.3-devel-cuda_multi-ubuntu20.04
 
 %environment
     export XDG_RUNTIME_DIR=
     export PATH="$PATH:/usr/local/bin:/opt/anaconda3/bin:/usr/bin"
     export PATH=/opt/nvidia/nsight-systems/2020.5.1/bin:/opt/nvidia/nsight-compute/2020.2.1:$PATH
-    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/lib64/"
+    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/"
 
 %post
     build_tmp=$(mktemp -d) && cd ${build_tmp}
@@ -28,6 +29,7 @@ FROM: nvcr.io/nvidia/nvhpc:20.11-devel-cuda_multi-ubuntu20.04
     apt-get install --no-install-recommends -y build-essential 
 
     python3 /labs/nways_MD/English/C/source_code/dataset.py
+    python3 /labs/nways_MD/English/Fortran/source_code/dataset.py
 
 
 # NVIDIA nsight-systems-2020.5.1 ,nsight-compute-2
@@ -52,4 +54,4 @@ FROM: nvcr.io/nvidia/nvhpc:20.11-devel-cuda_multi-ubuntu20.04
     "$@"
 
 %labels
-    AUTHOR mozhgank
+    AUTHOR mozhgank

+ 1 - 1
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/cudac/nways_cuda.ipynb

@@ -418,7 +418,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.4"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 7 - 7
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openacc/nways_openacc.ipynb

@@ -150,7 +150,7 @@
    "outputs": [],
    "source": [
     "#Compile the code for multicore\n",
-    "!cd ../../source_code/openacc && nvc++ -acc -ta=multicore -Minfo=accel -o rdf rdf.cpp -I/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/lib64 -lnvToolsExt"
+    "!cd ../../source_code/openacc && nvc++ -acc -ta=multicore -Minfo=accel -o rdf rdf.cpp -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt"
    ]
   },
   {
@@ -202,7 +202,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's checkout the profiler's report. [Download the profiler output](../../source_code/openacc/rdf_multicore.qdrep) and open it via the GUI. From the timeline view, checkout the NVTX markers displays as part of threads. **Why are we using NVTX?** Please see the section on [Using NVIDIA Tools Extension (NVTX)](../profiling-c.ipynb#Using-NVIDIA-Tools-Extension-(NVTX)).\n",
+    "Let's checkout the profiler's report. [Download the profiler output](../../source_code/openacc/rdf_multicore.qdrep) and open it via the GUI. From the timeline view, checkout the NVTX markers displays as part of threads. **Why are we using NVTX?** Please see the section on [Using NVIDIA Tools Extension (NVTX)](../../../../../profiler/English/jupyter_notebook/profiling-c.ipynb#Using-NVIDIA-Tools-Extension-(NVTX)).\n",
     "\n",
     "From the timeline view, right click on the nvtx row and click the \"show in events view\". Now you can see the nvtx statistic at the bottom of the window which shows the duration of each range. \n",
     "\n",
@@ -235,7 +235,7 @@
    "outputs": [],
    "source": [
     "#compile for Tesla GPU\n",
-    "!cd ../../source_code/openacc && nvc++ -acc -ta=tesla:managed,lineinfo  -Minfo=accel -o rdf rdf.cpp -L/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/lib64 -lnvToolsExt"
+    "!cd ../../source_code/openacc && nvc++ -acc -ta=tesla:managed,lineinfo  -Minfo=accel -o rdf rdf.cpp -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt"
    ]
   },
   {
@@ -316,7 +316,7 @@
     "\n",
     "| Compiler | Latest Version | Maintained by | Full or Partial Support |\n",
     "| --- | --- | --- | --- |\n",
-    "| HPC SDK| 20.11 | NVIDIA HPC SDK | Full 2.5 spec |\n",
+    "| HPC SDK| 21.3 | NVIDIA HPC SDK | Full 2.5 spec |\n",
     "| GCC | 10 | Mentor Graphics, SUSE | 2.0 spec, Limited Kernel directive support, No Unified Memory |\n",
     "| CCE| latest | Cray | 2.0 Spec | \n"
    ]
@@ -362,7 +362,7 @@
    "outputs": [],
    "source": [
     "#compile for Tesla GPU\n",
-    "!cd ../../source_code/openacc && nvc++ -acc -ta=tesla:managed,lineinfo  -Minfo=accel -o rdf rdf.cpp -L/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/lib64 -lnvToolsExt"
+    "!cd ../../source_code/openacc && nvc++ -acc -ta=tesla:managed,lineinfo  -Minfo=accel -o rdf rdf.cpp -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt"
    ]
   },
   {
@@ -549,7 +549,7 @@
    "outputs": [],
    "source": [
     "#compile for Tesla GPU without managed memory\n",
-    "!cd ../../source_code/openacc && nvc++ -acc -ta=tesla,lineinfo -Minfo=accel -o rdf rdf.cpp -L/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/lib64 -lnvToolsExt"
+    "!cd ../../source_code/openacc && nvc++ -acc -ta=tesla,lineinfo -Minfo=accel -o rdf rdf.cpp -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt"
    ]
   },
   {
@@ -690,7 +690,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.4"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 3 - 3
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp.ipynb

@@ -273,7 +273,7 @@
    "outputs": [],
    "source": [
     "#Compile the code for muticore\n",
-    "!cd ../../source_code/openmp && nvc++ -mp=multicore -Minfo=mp -I/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/include -o rdf rdf.cpp"
+    "!cd ../../source_code/openmp && nvc++ -mp=multicore -Minfo=mp -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -o rdf rdf.cpp"
    ]
   },
   {
@@ -437,7 +437,7 @@
     "| CCE| latest | Cray | 4.5 partial spec supported | \n",
     "| XL | latest | IBM | 4.5 partial spec supported |\n",
     "| Clang | 9.0 | Community | 4.5 partial spec supported |\n",
-    "| HPC SDK | 20.11 | NVIDIA HPC SDK | 5.0 spec supported |\n",
+    "| HPC SDK | 21.3 | NVIDIA HPC SDK | 5.0 spec supported |\n",
     "\n"
    ]
   },
@@ -519,7 +519,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.4"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 1 - 1
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/serial/rdf_overview.ipynb

@@ -121,7 +121,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.4"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 2 - 2
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/stdpar/nways_stdpar.ipynb

@@ -160,7 +160,7 @@
    "outputs": [],
    "source": [
     "#Compile the code for muticore\n",
-    "!cd ../../source_code/stdpar && nvc++ -std=c++17 -stdpar=multicore -o rdf rdf.cpp -I/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/include -ltbb"
+    "!cd ../../source_code/stdpar && nvc++ -std=c++17 -stdpar=multicore -o rdf rdf.cpp -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -ltbb"
    ]
   },
   {
@@ -378,7 +378,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.4"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 1 - 1
hpc/nways/nways_labs/nways_MD/English/C/source_code/serial/Makefile

@@ -3,7 +3,7 @@
 CC := nvc++
 CFLAGS := -O3 -w -ldl
 ACCFLAGS := -Minfo=accel
-NVTXLIB := -I/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/20.11/cuda/11.0/lib64 -lnvToolsExt
+NVTXLIB := -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt
 
 rdf: rdf.cpp
 	${CC} ${CFLAGS} ${ACCFLAGS} -o rdf rdf.cpp ${NVTXLIB} 

+ 115 - 0
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/Final_Remarks.ipynb

@@ -0,0 +1,115 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Final Remarks\n",
+    "\n",
+    "In this tutorial we took an approach where same algorithm was ported to GPU using different popular methods. Each method has it strengths and suffices a purpose for which it was created. From a developer point of view below listed are some key parameters which are crucial to any development exercise: \n",
+    "\n",
+    "- **Ease of Programming**: How much in-depth knowledge of processor architecture is required for a developer before starting to convert the code to GPU?\n",
+    "- **Performance**: How much effort is required to reach desirable performance on a particular architecture.\n",
+    "- **Portability**: To what extent does the same code run on multiple architecture? What provisions are provided by programming approach to target different platforms?\n",
+    "- **Support**: The overall ecosystem and support by the community.\n",
+    "    - Which all compilers implement the standard?\n",
+    "    - Which all languages are supported?\n",
+    "    - Which all applications make use it?\n",
+    "    - How easy or difficult it is to profile/debug the application?\n",
+    "    \n",
+    "Let us try to create a high level buckets for each of these parameter above with a limited scope of GPU support:\n",
+    "\n",
+    "| | |  |  | \n",
+    "| :--- | :--- | :--- | :--- |\n",
+    "| Ease of Programming | Low: Minimal architecture specific knowledge needed  | Intermediate: Mimimal changes expected in code design.  Using these along with architecture knowledge helps in better performance | High: In-Depth GPU architecture knowledge must |\n",
+    "| Performance  | Depends: Based on the complexity/type of application the performance may vary | High: Exposes methods to get good performance. These methods are integral part of design and exposed to programmer at various granularities | Best: Full control to developers to control parallelism and memory access |\n",
+    "| Portability | Integral: Part of the key objective  | Limited: Works only on specific platform | | \n",
+    "| Support | Established: Proven over years and support by multiple vendors for GPU | Emerging: Gaining traction by multiple vendors for GPU  | |\n",
+    "\n",
+    "There is a very thin line between these categories and within that limited scope and view we could categorize different approaches as follows:\n",
+    "\n",
+    " \n",
+    "| | OpenACC | OpenMP | DO-CONCURRENT | Kokkos | CUDA Laguages |\n",
+    "| --- | --- | --- | --- | --- | --- |\n",
+    "| Ease | High  | High | High  | Intermediate | Low |\n",
+    "| Performance  | Depends | Depends | Depends | High | Best |\n",
+    "| Portability | Integral  | Integral | Integral | Integral | Limited |\n",
+    "| Support | Established | Emerging | Emerging | Established | Established |\n",
+    "\n",
+    "Below given are points that will help users as there is no one programming model that fits all needs.\n",
+    "\n",
+    "## Ease of Programming\n",
+    "- The directive‐based OpenMP and OpenACC programming models are generally least intrusive when applied to the loops. \n",
+    "- CUDA required a comparable amount of rewriting effort, in particular, to map the loops onto a CUDA grid of threads and thread blocks\n",
+    "- DO-CONCURRENT also required us to do minimal change by replacing the *do* loop to *do concurrent* . \n",
+    "- The overhead for OpenMP, OpenACC and DO-CONCURRENT in terms of lines of code is the smallest\n",
+    "\n",
+    "## Performance\n",
+    "While we have not gone into the details of optimization for any of these programming model the analysis provided here is based on the general design of the programming model itself.\n",
+    "\n",
+    "- OpenACC and OpenMP abstract model defines a least common denominator for accelerator devices, but cannot represent architectural specifics of these devices without making the language less portable.\n",
+    "- DO-CONCURRENT on the other hand is more abstract and gives less control to developers to optimize the code\n",
+    "\n",
+    "## Portability\n",
+    "We observed the same code being run on moth multicore and GPU using OpenMP, OpenACC and DO-CONCURRENT. The point we highlight here is how a programming model supports the divergent cases where developers may choose to use different directive variant to get more performance. In a real application the tolerance for this portability/performance trade-off will vary according to the needs of the programmer and application \n",
+    "- OpenMP supports [Metadirective](https://www.openmp.org/spec-html/5.0/openmpsu28.html) where the developer can choose to activate different directive variant based on the condition selected.\n",
+    "- In OpenACC when using ```kernel``` construct, the compiler is responsible for mapping and partitioning the program to the underlying hardware. Since the compiler will mostly take care of the parallelization issues, the descriptive approach may generate performance code for specific architecture. The downside is the quality of the generated accelerated code depends significantly on the capability of the compiler used and hence the term \"may\".\n",
+    "\n",
+    "\n",
+    "## Support\n",
+    "- OpenACC implementation is present in most popular compilers like NVIDIA HPC SDK, PGI, GCC, Clang and CRAY. \n",
+    "- OpenMP GPU support is currently available on limited compilers but being the most supported programming model for multicore it is matter of time when it comes at par with other models for GPU support.\n",
+    "- DO-CONCURRENT being part of the ISO Fortran standard is bound to become integral part of most compiler supporting parallelism. \n",
+    "\n",
+    "\n",
+    "Parallel Computing in general has been a difficult task and requires developers not just to know a programming approach but also think in parallel. While this tutorial provide you a good start, it is highly recommended to go through Profiling and Optimization bootcamps as next steps.\n",
+    "\n",
+    "-----\n",
+    "\n",
+    "# <div style=\"text-align: center ;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em\">[HOME](../../nways_MD_start.ipynb)</div>\n",
+    "\n",
+    "-----\n",
+    "\n",
+    "# Links and Resources\n",
+    "[OpenACC API guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC%20API%202.6%20Reference%20Guide.pdf)\n",
+    "\n",
+    "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
+    "\n",
+    "[NVIDIA Nsight Compute](https://developer.nvidia.com/nsight-compute)\n",
+    "\n",
+    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
+    "\n",
+    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
+    "\n",
+    "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
+    "\n",
+    "--- \n",
+    "\n",
+    "## Licensing \n",
+    "\n",
+    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

+ 50 - 0
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/GPU_Architecture_Terminologies.ipynb

@@ -0,0 +1,50 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Unified Memory\n",
+    "\n",
+    "With every new CUDA and GPU architecture release, new features are added. These new features provide more performance and ease of programming or allow developers to implement new algorithms that otherwise weren't possible to port on GPUs using CUDA.\n",
+    "One such important feature that was released from CUDA 6.0 onward and finds its implementation from the Kepler GPU architecture is unified memory (UM). \n",
+    "\n",
+    "In simpler words, UM provides the user with a view of single memory space that's accessible by all GPUs and CPUs in the system. This is illustrated in the following diagram:\n",
+    "\n",
+    "<img src=\"./images/UM.png\">\n",
+    "\n",
+    "UM simplifies programming effort for beginners to CUDA as developers need not explicitly manage copying data to and from GPU. We will be using this feature of latest CUDA release and GPU architecture in labs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Licensing \n",
+    "\n",
+    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

Filskillnaden har hållts tillbaka eftersom den är för stor
+ 522 - 0
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/cudafortran/nways_cuda.ipynb


Filskillnaden har hållts tillbaka eftersom den är för stor
+ 351 - 0
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/doconcurrent/nways_doconcurrent.ipynb


+ 9 - 0
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/fidentify.log

@@ -0,0 +1,9 @@
+
+
+Sun May 16 17:30:07 2021
+Command line: fidentify
+
+fidentify 7.1, Data Recovery Utility, July 2019
+Christophe GRENIER <grenier@cgsecurity.org>
+https://www.cgsecurity.org
+748 first-level signatures enabled

BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/Nsight Diagram.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/Optimization_Cycle.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/UM.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/allsection-compute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/baseline-compute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/charts-compute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cli-out.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/coalesced_mem.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/collapse_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/collapse_pre.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/collapse_thread.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_analyz.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_command.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_command_line.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/compute_open.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cpu.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_hw_sw.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_indexing.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_profile.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_profile_api.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_profile_timeline.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_vec_add.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/cuda_vec_add2.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/data_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/data_thread.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/diagram.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/do_concurrent_gpu.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/do_concurrent_multicore.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_128.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_256.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_32.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gang_vector.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/gpu_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kernel_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kernel_indep_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kokkos_abstraction.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kokkos_ecosystem.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/kokkos_mirror_view.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/laplas3.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/launch-compute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nsight_open.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx.PNG


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx_gpu.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx_multicore.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/nvtx_serial.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc correlation.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_3_directives.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_construct.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_copyclause.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_multicore_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_parallel.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_parallel2.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openacc_parallel_loop.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_feedback_collapse.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_feedback_multicore.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_fork_join.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_gpu.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_gpu_collapse.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_multicore.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_parallel_construct.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_parallelfor_construct.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_target_distribute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_target_teams.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_teams.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/openmp_teams_for.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/page-compute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel1f.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel2f.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel3f.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_data.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_data_feedback.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_detailed.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_expand.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_loop.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_timeline.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/parallel_unified.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/rdf.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/roofline_collapse.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/rule-compute.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/scheduler_collapse.png


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/serial.jpg


BIN
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/sol.png


+ 0 - 0
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/images/sol_baseline.png


Vissa filer visades inte eftersom för många filer har ändrats