Forráskód Böngészése

nsight compute,opencc&openmp opt C

Mozhgan K. Chimeh 3 éve
szülő
commit
b167447832
57 módosított fájl, 759 hozzáadás és 316 törlés
  1. 10 12
      hpc/nways/Dockerfile
  2. 6 6
      hpc/nways/Singularity
  3. 13 2
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/GPU_Architecture_Terminologies.ipynb
  4. 2 2
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/cudac/nways_cuda.ipynb
  5. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_baseline.png
  6. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg.png
  7. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg_memory.png
  8. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg_occupancy.png
  9. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg_roofline.png
  10. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_gpu_collapse.png
  11. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_collapse.png
  12. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_occupancy.png
  13. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_roofline.png
  14. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_split_cmp.png
  15. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_split_cmp2.png
  16. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_split_grid.png
  17. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_warp_cmp.png
  18. BIN
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/stdpar_gpu.png
  19. 1 1
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openacc/nways_openacc.ipynb
  20. 1 1
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openacc/nways_openacc_opt.ipynb
  21. 14 12
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp.ipynb
  22. 161 36
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp_opt.ipynb
  23. 3 3
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/serial/rdf_overview.ipynb
  24. 1 1
      hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/stdpar/nways_stdpar.ipynb
  25. 4 1
      hpc/nways/nways_labs/nways_MD/English/C/source_code/openmp/SOLUTION/rdf_offload_collapse_num.cpp
  26. 198 0
      hpc/nways/nways_labs/nways_MD/English/C/source_code/openmp/SOLUTION/rdf_offload_split_num.cpp
  27. 3 3
      hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/serial/rdf_overview.ipynb
  28. 12 3
      hpc/nways/nways_labs/nways_MD/English/nways_MD_start.ipynb
  29. 1 1
      hpc/nways/nways_labs/nways_start.ipynb
  30. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/SOL-compute.png
  31. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/baseline-compute.png
  32. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/baseline1-compute.png
  33. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-cli-1.png
  34. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-cli-2.png
  35. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-memory.png
  36. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-memtable.png
  37. 0 0
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-open.png
  38. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-sections.png
  39. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-sets.png
  40. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/expand-compute.png
  41. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/header-compute.png
  42. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/memory-compute.png
  43. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/nsys-compute-command.png
  44. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/nsys-compute-command1.png
  45. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/nsys-compute-command2.png
  46. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-achieved.png
  47. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-analysis.png
  48. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-baseline.png
  49. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-compute.png
  50. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-overview.png
  51. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/sass-compute.png
  52. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/sections-compute.png
  53. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/source-compute.png
  54. BIN
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/warning-compute.png
  55. 327 0
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/nsight_compute.ipynb
  56. 2 5
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/profiling.ipynb
  57. 0 227
      hpc/nways/nways_labs/profiler/English/jupyter_notebook/profiling-c.ipynb

+ 10 - 12
hpc/nways/Dockerfile

@@ -16,17 +16,14 @@ RUN apt-get -y update && \
 
 ############################################
 # NVIDIA nsight-systems-2020.5.1 ,nsight-compute-2
-RUN apt-get update -y && \
-        DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
-        apt-transport-https \
-        ca-certificates \
-        gnupg \
-        wget && \
-        apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys F60F4B3D7FA2AF80 && \
-        echo "deb https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64/ /" >> /etc/apt/sources.list.d/nsight.list &&\
-        apt-get update -y
-
-RUN DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends nsight-systems-2020.5.1 nsight-compute-2020.2.1 
+#RUN apt-get update -y && \
+#        DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+#       apt-transport-https ca-certificates gnupg wget && \
+#        apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys F60F4B3D7FA2AF80 && \
+#        echo "deb https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64/ /" >> /etc/apt/sources.list.d/nsight.list &&\
+apt-get update -y
+
+# RUN DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends nsight-systems-2020.5.1 nsight-compute-2020.2.1 
 
 # TO COPY the data
 COPY nways_labs/ /labs/
@@ -36,7 +33,8 @@ RUN python3 /labs/nways_MD/English/Fortran/source_code/dataset.py
 
 #################################################
 ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.3/lib64/"
-ENV PATH="/opt/nvidia/nsight-systems/2020.5.1/bin:/opt/nvidia/nsight-compute/2020.2.1:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include:/usr/local/bin:/opt/anaconda3/bin:/usr/bin:$PATH"
+#ENV PATH="/opt/nvidia/nsight-systems/2020.5.1/bin:/opt/nvidia/nsight-compute/2020.2.1:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include:/usr/local/bin:/opt/anaconda3/bin:/usr/bin:$PATH"
+ENV PATH="/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include:/usr/local/bin:/opt/anaconda3/bin:/usr/bin:$PATH"
 #################################################
 
 ADD nways_labs/ /labs

+ 6 - 6
hpc/nways/Singularity

@@ -7,7 +7,7 @@ FROM: nvcr.io/nvidia/nvhpc:21.3-devel-cuda_multi-ubuntu20.04
 %environment
     export XDG_RUNTIME_DIR=
     export PATH="$PATH:/usr/local/bin:/opt/anaconda3/bin:/usr/bin"
-    export PATH=/opt/nvidia/nsight-systems/2020.5.1/bin:/opt/nvidia/nsight-compute/2020.2.1:$PATH
+   # export PATH=/opt/nvidia/nsight-systems/2020.5.1/bin:/opt/nvidia/nsight-compute/2020.2.1:$PATH
     export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/"
 
 %post
@@ -33,12 +33,12 @@ FROM: nvcr.io/nvidia/nvhpc:21.3-devel-cuda_multi-ubuntu20.04
 
 
 # NVIDIA nsight-systems-2020.5.1 ,nsight-compute-2
-    apt-get update -y   
-    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends apt-transport-https ca-certificates gnupg wget
-    apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys F60F4B3D7FA2AF80
-    echo "deb https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64/ /" >> /etc/apt/sources.list.d/nsight.list 
+  # apt-get update -y   
+  #  DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends apt-transport-https ca-certificates gnupg wget
+  #  apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys F60F4B3D7FA2AF80
+  #  echo "deb https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64/ /" >> /etc/apt/sources.list.d/nsight.list 
     apt-get update -y 
-    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends nsight-systems-2020.5.1 nsight-compute-2020.2.1 
+  #  DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends nsight-systems-2020.5.1 nsight-compute-2020.2.1 
     apt-get install --no-install-recommends -y build-essential
 
     wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 

+ 13 - 2
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/GPU_Architecture_Terminologies.ipynb

@@ -4,14 +4,25 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Unified Memory\n",
+    "### Thread\n",
+    "### Block\n",
+    "### Grid\n",
+    "### Warp\n",
+    "### Occupancy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Unified Memory\n",
     "\n",
     "With every new CUDA and GPU architecture release, new features are added. These new features provide more performance and ease of programming or allow developers to implement new algorithms that otherwise weren't possible to port on GPUs using CUDA.\n",
     "One such important feature that was released from CUDA 6.0 onward and finds its implementation from the Kepler GPU architecture is unified memory (UM). \n",
     "\n",
     "In simpler words, UM provides the user with a view of single memory space that's accessible by all GPUs and CPUs in the system. This is illustrated in the following diagram:\n",
     "\n",
-    "<img src=\"./images/UM.png\">\n",
+    "<img src=\"./images/UM.png\" width=\"80%\" height=\"80%\">\n",
     "\n",
     "UM simplifies programming effort for beginners to CUDA as developers need not explicitly manage copying data to and from GPU. We will be using this feature of latest CUDA release and GPU architecture in labs."
    ]

+ 2 - 2
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/cudac/nways_cuda.ipynb

@@ -78,7 +78,7 @@
     "\n",
     "The diagram below shows a higher level of abstraction of components of GPU hardware and its respective programming model mapping. \n",
     "\n",
-    "<img src=\"../images/cuda_hw_sw.png\">\n",
+    "<img src=\"../images/cuda_hw_sw.png\" width=\"80%\" height=\"80%\">\n",
     "\n",
     "As shown in the diagram above CUDA programming model is tightly coupled with hardware design. This makes CUDA one of the most efficient parallel programming model for shared memory systems. Another way to look at the diagram shown above is given below: \n",
     "\n",
@@ -418,7 +418,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.2"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,

BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_baseline.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg_memory.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg_occupancy.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_collapse_reg_roofline.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_gpu_collapse.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_collapse.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_occupancy.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_roofline.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_split_cmp.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_split_cmp2.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_offload_split_grid.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/openmp_warp_cmp.png


BIN
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/images/stdpar_gpu.png


+ 1 - 1
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openacc/nways_openacc.ipynb

@@ -690,7 +690,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.2"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,

+ 1 - 1
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openacc/nways_openacc_opt.ipynb

@@ -312,7 +312,7 @@
    "outputs": [],
    "source": [
     "#profile the selected kernel in the solution with Nsight compute\n",
-    "!cd ../../source_code/openacc && ncu --set full --kernel-regex pair_gpu_182_gpu --launch-skip 1 --launch-count 1 -o rdf_collapse_solution ./rdf"
+    "!cd ../../source_code/openacc && ncu --set full -k pair_gpu --launch-skip 1 --launch-count 1 -o rdf_collapse_solution ./rdf"
    ]
   },
   {

A különbségek nem kerülnek megjelenítésre, a fájl túl nagy
+ 14 - 12
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp.ipynb


A különbségek nem kerülnek megjelenítésre, a fájl túl nagy
+ 161 - 36
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp_opt.ipynb


+ 3 - 3
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/serial/rdf_overview.ipynb

@@ -44,11 +44,11 @@
     "\n",
     "-----\n",
     "\n",
-    "# <div style=\"text-align: center ;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em\">[Profiling lab](../../../../../profiler/English/jupyter_notebook/profiling-c.ipynb)</div> \n",
+    "# <div style=\"text-align: center ;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em\">[Profiling lab](../../../../../profiler/English/jupyter_notebook/nsight_systems.ipynb)</div> \n",
     "\n",
     "-----\n",
     "\n",
-    "Now, that we are familiar with the Nsight Profiler and know how to [NVTX](../../../../../profiler/English/jupyter_notebook/profiling-c.ipynb#nvtx), let's profile the serial code and checkout the output."
+    "Now, that we are familiar with the Nsight Profiler and know how to [NVTX](../../../../../profiler/English/jupyter_notebook/nsight_systems.ipynb#nvtx), let's profile the serial code and checkout the output."
    ]
   },
   {
@@ -121,7 +121,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.2"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,

+ 1 - 1
hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/stdpar/nways_stdpar.ipynb

@@ -287,7 +287,7 @@
     "\n",
     "If you inspect the output of the profiler closer, you can see the usage of *Unified Memory* annotated with green rectangle which was explained in previous sections.\n",
     "\n",
-    "Moreover, if you compare the NVTX marker `Pair_Calculation` (from the NVTX row) in both multicore and GPU version, you can see how much improvement you achieved. In the *example screenshot*, we were able to reduce that range from 1.52 seconds to 188.4 mseconds.\n",
+    "Moreover, if you compare the NVTX marker `Pair_Calculation` (from the NVTX row) in both multicore and GPU version, you can see how much improvement you achieved. In the *example screenshot*, we were able to reduce that range from 1.52 seconds to 48.8 mseconds.\n",
     "\n",
     "Feel free to checkout the [solution](../../source_code/stdpar/SOLUTION/rdf.cpp) to help you understand better or compare your implementation with the sample solution."
    ]

+ 4 - 1
hpc/nways/nways_labs/nways_MD/English/C/source_code/openmp/SOLUTION/rdf_offload_collapse_num.cpp

@@ -166,12 +166,15 @@ void pair_gpu(const double *d_x, const double *d_y, const double *d_z,
     cut = box * 0.5;
     int count = 0;
     printf("\n %d %d ", nconf, numatm);
+
     for (int frame = 0; frame < nconf; frame++)
     {
         printf("\n %d  ", frame);
-#pragma omp target teams distribute parallel for private(dx, dy, dz, r, ig2) collapse(2) num_threads(256)
+#pragma omp target teams distribute
+
         for (int id1 = 0; id1 < numatm; id1++)
         {
+#pragma omp parallel for private(dx, dy, dz, r, ig2)
             for (int id2 = 0; id2 < numatm; id2++)
             {
                 dx = d_x[frame * numatm + id1] - d_x[frame * numatm + id2];

+ 198 - 0
hpc/nways/nways_labs/nways_MD/English/C/source_code/openmp/SOLUTION/rdf_offload_split_num.cpp

@@ -0,0 +1,198 @@
+#include <stdio.h>
+#include <iostream>
+#include <fstream>
+#include <math.h>
+#include <cstring>
+#include <cstdio>
+#include <iomanip>
+#include <omp.h>
+#include "dcdread.h"
+#include <assert.h>
+#include <nvtx3/nvToolsExt.h>
+
+void pair_gpu(const double *d_x, const double *d_y, const double *d_z,
+              unsigned int *d_g2, int numatm, int nconf,
+              const double xbox, const double ybox, const double zbox,
+              int d_bin);
+
+int main(int argc, char *argv[])
+{
+    double xbox, ybox, zbox;
+    double *h_x, *h_y, *h_z;
+    unsigned int *h_g2;
+    int nbin;
+    int numatm, nconf, inconf;
+    string file;
+
+    ///////////////////////////////////////////////////////////////
+
+    inconf = 10;
+    nbin = 2000;
+    file = "../input/alk.traj.dcd";
+    ///////////////////////////////////////
+    std::ifstream infile;
+    infile.open(file.c_str());
+    if (!infile)
+    {
+        cout << "file " << file.c_str() << " not found\n";
+        return 1;
+    }
+    assert(infile);
+
+    ofstream pairfile, stwo;
+    pairfile.open("RDF.dat");
+    stwo.open("Pair_entropy.dat");
+
+    /////////////////////////////////////////////////////////
+    dcdreadhead(&numatm, &nconf, infile);
+    cout << "Dcd file has " << numatm << " atoms and " << nconf << " frames" << endl;
+    if (inconf > nconf)
+        cout << "nconf is reset to " << nconf << endl;
+    else
+    {
+        nconf = inconf;
+    }
+    cout << "Calculating RDF for " << nconf << " frames" << endl;
+    ////////////////////////////////////////////////////////
+
+    unsigned long long int sizef = nconf * numatm * sizeof(double);
+    unsigned long long int sizebin = nbin * sizeof(unsigned int);
+
+    h_x = (double *)malloc(sizef);
+    h_y = (double *)malloc(sizef);
+    h_z = (double *)malloc(sizef);
+    h_g2 = (unsigned int *)malloc(sizebin);
+
+    memset(h_g2, 0, sizebin);
+
+    /////////reading cordinates//////////////////////////////////////////////
+    nvtxRangePush("Read_File");
+
+    double ax[numatm], ay[numatm], az[numatm];
+    for (int i = 0; i < nconf; i++)
+    {
+        dcdreadframe(ax, ay, az, infile, numatm, xbox, ybox, zbox);
+        for (int j = 0; j < numatm; j++)
+        {
+            h_x[i * numatm + j] = ax[j];
+            h_y[i * numatm + j] = ay[j];
+            h_z[i * numatm + j] = az[j];
+        }
+    }
+    nvtxRangePop(); //pop for REading file
+    cout << "Reading of input file is completed" << endl;
+//////////////////////////////////////////////////////////////////////////
+#pragma omp target data map(h_x [0:nconf * numatm], h_y [0:nconf * numatm], h_z [0:nconf * numatm], h_g2 [0:nbin])
+    {
+        nvtxRangePush("Pair_Calculation");
+        pair_gpu(h_x, h_y, h_z, h_g2, numatm, nconf, xbox, ybox, zbox, nbin);
+        nvtxRangePop(); //Pop for Pair Calculation
+    }
+    ////////////////////////////////////////////////////////////////////////
+    double pi = acos(-1.0);
+    double rho = (numatm) / (xbox * ybox * zbox);
+    double norm = (4.0l * pi * rho) / 3.0l;
+    double rl, ru, nideal;
+    double g2[nbin];
+    double r, gr, lngr, lngrbond, s2 = 0.0l, s2bond = 0.0l;
+    double box = min(xbox, ybox);
+    box = min(box, zbox);
+    double del = box / (2.0l * nbin);
+    nvtxRangePush("Entropy_Calculation");
+    for (int i = 0; i < nbin; i++)
+    {
+        rl = (i)*del;
+        ru = rl + del;
+        nideal = norm * (ru * ru * ru - rl * rl * rl);
+        g2[i] = (double)h_g2[i] / ((double)nconf * (double)numatm * nideal);
+        r = (i)*del;
+        pairfile << (i + 0.5l) * del << " " << g2[i] << endl;
+        if (r < 2.0l)
+        {
+            gr = 0.0l;
+        }
+        else
+        {
+            gr = g2[i];
+        }
+        if (gr < 1e-5)
+        {
+            lngr = 0.0l;
+        }
+        else
+        {
+            lngr = log(gr);
+        }
+
+        if (g2[i] < 1e-6)
+        {
+            lngrbond = 0.0l;
+        }
+        else
+        {
+            lngrbond = log(g2[i]);
+        }
+        s2 = s2 - 2.0l * pi * rho * ((gr * lngr) - gr + 1.0l) * del * r * r;
+        s2bond = s2bond - 2.0l * pi * rho * ((g2[i] * lngrbond) - g2[i] + 1.0l) * del * r * r;
+    }
+    nvtxRangePop(); //Pop for Entropy Calculation
+    stwo << "s2 value is " << s2 << endl;
+    stwo << "s2bond value is " << s2bond << endl;
+
+    cout << "#Freeing Host memory" << endl;
+    free(h_x);
+    free(h_y);
+    free(h_z);
+    free(h_g2);
+
+    cout << "#Number of atoms processed: " << numatm << endl
+         << endl;
+    cout << "#Number of confs processed: " << nconf << endl
+         << endl;
+    return 0;
+}
+void pair_gpu(const double *d_x, const double *d_y, const double *d_z,
+              unsigned int *d_g2, int numatm, int nconf,
+              const double xbox, const double ybox, const double zbox, int d_bin)
+{
+    double r, cut, dx, dy, dz;
+    int ig2;
+    double box;
+    int myround;
+    box = min(xbox, ybox);
+    box = min(box, zbox);
+
+    double del = box / (2.0 * d_bin);
+    cut = box * 0.5;
+    int count = 0;
+    printf("\n %d %d ", nconf, numatm);
+
+    for (int frame = 0; frame < nconf; frame++)
+    {
+        printf("\n %d  ", frame);
+#pragma omp target teams distribute num_teams(65535)
+
+        for (int id1 = 0; id1 < numatm; id1++)
+        {
+#pragma omp parallel for private(dx, dy, dz, r, ig2) 
+            for (int id2 = 0; id2 < numatm; id2++)
+            {
+                dx = d_x[frame * numatm + id1] - d_x[frame * numatm + id2];
+                dy = d_y[frame * numatm + id1] - d_y[frame * numatm + id2];
+                dz = d_z[frame * numatm + id1] - d_z[frame * numatm + id2];
+
+                dx = dx - xbox * (round(dx / xbox));
+                dy = dy - ybox * (round(dy / ybox));
+                dz = dz - zbox * (round(dz / zbox));
+
+                r = sqrtf(dx * dx + dy * dy + dz * dz);
+                if (r < cut)
+                {
+                    ig2 = (int)(r / del);
+#pragma omp atomic
+                    d_g2[ig2] = d_g2[ig2] + 1;
+                }
+            }
+        }
+    } //frame ends
+}

+ 3 - 3
hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/serial/rdf_overview.ipynb

@@ -42,11 +42,11 @@
     "\n",
     "-----\n",
     "\n",
-    "# <div style=\"text-align: center ;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em\">[Profiling lab](../../../../../profiler/English/jupyter_notebook/profiling.ipynb)</div> \n",
+    "# <div style=\"text-align: center ;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em\">[Profiling lab](../../../../../profiler/English/jupyter_notebook/nsight_systems.ipynb)</div> \n",
     "\n",
     "-----\n",
     "\n",
-    "Now, that we are familiar with the Nsight Profiler and know how to [NVTX](../../../../../profiler/English/jupyter_notebook/profiling.ipynb#nvtx), let's profile the serial code and checkout the output."
+    "Now, that we are familiar with the Nsight Profiler and know how to [NVTX](../../../../../profiler/English/jupyter_notebook/nsight_systems.ipynb#nvtx), let's profile the serial code and checkout the output."
    ]
   },
   {
@@ -119,7 +119,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.2"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,

+ 12 - 3
hpc/nways/nways_labs/nways_MD/English/nways_MD_start.ipynb

@@ -36,13 +36,22 @@
     "\n",
     " We will be following the cycle of Analysis - Parallelization - Optimization cycle throughout. To start with let us understand the Nsight tool ecosystem:   \n",
     "\n",
-    "- [Introduction to Profiling](../../profiler/English/jupyter_notebook/profiling.ipynb)\n",
+    "- [Nsight Systems](../../profiler/English/jupyter_notebook/nsight_systems.ipynb)\n",
     "    - Overview of Nsight profiler tools\n",
     "    - Introduction to Nsight Systems\n",
+    "    - How to view the report\n",
     "    - How to use NVTX APIs\n",
-    "    - Introduction to Nsight Compute\n",
     "    - Optimization Steps to parallel programming \n",
     "    \n",
+    "- [Nsight Compute](../../profiler/English/jupyter_notebook/nsight_compute.ipynb)\n",
+    "    - Introduction to Nsight Compute\n",
+    "    - Overview of sections\n",
+    "    - Roofline Charts\n",
+    "    - Memory Charts\n",
+    "    - Profiling a kernel using CLI\n",
+    "    - How to view the report\n",
+    " \n",
+    "    \n",
     "We will be working on porting a radial distribution function (RDF) to GPUs. Please choose one of the programming language to proceed working on RDF. \n",
     "\n",
     "\n",
@@ -127,7 +136,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.2"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,

+ 1 - 1
hpc/nways/nways_labs/nways_start.ipynb

@@ -75,7 +75,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.2"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,

BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/SOL-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/baseline-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/baseline1-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-cli-1.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-cli-2.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-memory.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-memtable.png


hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute_open.png → hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-open.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-sections.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/compute-sets.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/expand-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/header-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/memory-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/nsys-compute-command.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/nsys-compute-command1.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/nsys-compute-command2.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-achieved.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-analysis.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-baseline.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/roofline-overview.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/sass-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/sections-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/source-compute.png


BIN
hpc/nways/nways_labs/profiler/English/jupyter_notebook/images/warning-compute.png


A különbségek nem kerülnek megjelenítésre, a fájl túl nagy
+ 327 - 0
hpc/nways/nways_labs/profiler/English/jupyter_notebook/nsight_compute.ipynb


+ 2 - 5
hpc/nways/nways_labs/profiler/English/jupyter_notebook/profiling.ipynb

@@ -185,14 +185,11 @@
    "source": [
     "# Links and Resources\n",
     "\n",
-    "[OpenACC API Guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC%20API%202.6%20Reference%20Guide.pdf)\n",
     "\n",
     "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
     "\n",
     "\n",
-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
-    "\n",
-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
+    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System's latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
     "\n",
     "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
     "\n",
@@ -221,7 +218,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.2"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,

A különbségek nem kerülnek megjelenítésre, a fájl túl nagy
+ 0 - 227
hpc/nways/nways_labs/profiler/English/jupyter_notebook/profiling-c.ipynb