4 jaren geleden · 4081202718
--- a/hpc/nways/README.md
+++ b/hpc/nways/README.md
@@ -42,7 +42,7 @@ For instance :
 
																 `sudo docker build -t myimage:1.0 .`
															
 
																-While in the case of **Python** you have to specify the dockerfile name using flag **"-f"**, therefore run:
															
 
																+While in the case of **Python**, you have to specify the dockerfile name using flag **"-f"**, therefore run:
															
 
																 `sudo docker build -f <dockerfile name> -t <imagename>:<tagnumber> .`
															
@@ -51,7 +51,7 @@ For example :
 
																 `sudo docker build -f Dockerfile_python -t myimage:1.0 .`
															
 
																-For C, Fortran, and Python, the code labs have been written using Jupyter notebooks and a Dockerfile has been built to simplify deployment. In order to serve the docker instance for a student, it is necessary to expose port 8000 from the container, for instance, the following command would expose port 8000 inside the container as port 8000 on the lab machine:
															
 
																+For C, Fortran, and Python, the code labs have been written using Jupyter notebooks and a Dockerfile has been built to simplify deployment. In order to serve the docker instance for a student, it is necessary to expose port 8000 from the container. For example, the following command would expose port 8000 inside the container as port 8000 on the lab machine:
															
 
																 `sudo docker run --rm -it --gpus=all -p 8888:8888 myimage:1.0`
															
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/Final_Remarks.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/Final_Remarks.ipynb
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/cupy/cupy_RDF.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/cupy/cupy_RDF.ipynb
@@ -280,22 +280,22 @@
 
																     "---\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																-    "## Links and Resources\n",
															
 
																+    "# Links and Resources\n",
															
 
																     "\n",
															
 
																     "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
															
 
																     "\n",
															
 
																-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																+    "[NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																     "\n",
															
 
																-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																+    "**NOTE**: To be able to see the Nsight System profiler output, please download the latest version of the Nsight System from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																     "\n",
															
 
																     "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
															
 
																     "\n",
															
 
																-    "--- \n",
															
 
																+    "---\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "## Licensing \n",
															
 
																     "\n",
															
 
																-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																+    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																    ]
															
 
																   }
															
 
																  ],
															
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/cupy/cupy_guide.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/cupy/cupy_guide.ipynb
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/cupy/serial_RDF.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/cupy/serial_RDF.ipynb
@@ -15,14 +15,13 @@
 
																     "\n",
															
 
																     "## A Recap on RDF\n",
															
 
																     "\n",
															
 
																-    "- The radial distribution function (RDF) denoted in equations by g(r) defines the probability of finding a particle at a distance r from another tagged particle. \n",
															
 
																-    "- The RDF is strongly dependent on the type of matter so will vary greatly for solids, gases and liquids.\n",
															
 
																-    "- It is observed the code complexity of the algorithm in 𝑁^2. Let us get into details of the accelerated code analysis. \n",
															
 
																+    "- The radial distribution function (RDF) denoted as g(r) defines the probability of finding a particle at a distance r from another tagged particle. The RDF is strongly dependent on the type of matter so will vary greatly for solids, gases and liquids. You can read more [here](https://en.wikibooks.org/wiki/Molecular_Simulation/Radial_Distribution_Functions).\n",
															
 
																+    "- The code complexity of the algorithm is $N^{2}$. \n",
															
 
																     "- The input data for the serial code is fetched from a DCD binary trajectory file.\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "### The Serial Code\n",
															
 
																-    "- The Cell below consist of two functions namely **dcdreadhead** and **dcdreadframe**\n",
															
 
																+    "- The cell below consists of two functions, namely **dcdreadhead** and **dcdreadframe**\n",
															
 
																     "- The **dcdreadhead** function computes the total number of frames and atoms from the DCDFile **(input/alk.traj.dcd)**, while the **dcdreadframe** function reads 10 frames and 6720 atoms (note: each frame contains 6720 atoms) using the MDAnalysis library. \n",
															
 
																     "- Both functions run on the Host (CPU) and are being called from the function **main()**.\n",
															
 
																     "### <u>Cell 1</u>"
															
@@ -80,8 +79,8 @@
 
																     "##  pair_gpu function\n",
															
 
																     "\n",
															
 
																     "- The pair_gpu is the function where the main task of the RDF serial implementation is being executed. The function computes differences in xyz DCD frames.\n",
															
 
																-    "- The essence of njit(just-in-time) decorator is to get pair_gpu function to compile under no python mode, and this is really important for good performance. \n",
															
 
																-    "- The decorator **@njit** or **@jit(nopython=True)** ensures that an exception is raised when compilation fails as a way of to alert that a bug is found within the decorated function. You can read more [here](https://numba.pydata.org/numba-doc/latest/user/performance-tips.html).\n",
															
 
																+    "- The essence of njit(just-in-time) decorator is to get pair_gpu function to compile under no python mode, and this is important for good performance. \n",
															
 
																+    "- The decorator **@njit** or **@jit(nopython=True)** ensures that an exception is raised when compilation fails as a way to alert the user that a bug is found within the decorated function. You can read more [here](https://numba.pydata.org/numba-doc/latest/user/performance-tips.html).\n",
															
 
																     "\n",
															
 
																     "### <u>Cell 2</u>"
															
 
																    ]
															
@@ -124,13 +123,13 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "#### Brief Analysis on internal task performed within pair_gpu function\n",
															
 
																-    "- The graphic below identifies the various operations executed in the pair_gpu function. This function executes three nested-loops using tricky indexing manipulation within the arrays.\n",
															
 
																+    "#### Brief Analysis on Tasks Performed within pair_gpu function\n",
															
 
																+    "- The graphic below identifies the various operations executed in the pair_gpu function. This function executes three nested loops using tricky indexing manipulation within the arrays.\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/pair_gpu.png\" width=\"80%\"/>\n",
															
 
																     "\n",
															
 
																-    "- The indexing flow for the operation 1 is simulated using the graphic below. Each green box simualate the subtraction operation within the two inner loops (id1 & id2) while the indexes written in blue signifies the outer-most loop (frame) which iterates 10 times. \n",
															
 
																+    "- The indexing flow for the operation 1 is simulated using the graphic below. Each green box simulates the subtraction operation within the two inner loops (id1 & id2) while the indexes written in blue signifies the outer-most loop (frame) which iterates 10 times. \n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/pair_gpu_analysis.png\" width=\"80%\"/>\n",
															
 
																     "\n",
															
@@ -139,7 +138,7 @@
 
																     "\n",
															
 
																     "\n",
															
 
																     "### The Main Function\n",
															
 
																-    "- This is the entry point of the program where every other functions including the **pair_gpu** function are called. The output of the main function is written into two files. An image version of the output files (\"**cupy_RDF.dat**\" & \"**cupy_Pair_entropy.dat**\") are displayed below the code cell.\n",
															
 
																+    "- This is the entry point of the program where every other function including the **pair_gpu** function are called. The output of the main function is written into two files. An image version of the output files (\"**cupy_RDF.dat**\" & \"**cupy_Pair_entropy.dat**\") are displayed below the code cell.\n",
															
 
																     "\n",
															
 
																     "### <u>Cell 3</u>"
															
 
																    ]
															
@@ -276,15 +275,15 @@
 
																     "\n",
															
 
																     "# Lab Task \n",
															
 
																     "\n",
															
 
																-    "1. **Run the serial code from cell 1, 2, & 3**\n",
															
 
																+    "1. **Run the serial code from cell 1, 2, & 3**.\n",
															
 
																     "    - Remove the **\"#\"** behind the **main()** before running the cell 3:\n",
															
 
																     "    ```python\n",
															
 
																     "       if __name__ == \"__main__\":\n",
															
 
																     "                main()\n",
															
 
																     "    ```\n",
															
 
																-    "2. **Now, lets start modifying the original code to CuPy code constructs.**\n",
															
 
																-    "> From the top menu, click on File, and Open **nways_serial.py** from the current directory at **Python/source_code/cupy** directory. Remember to SAVE your code after changes, before running below cells. \n",
															
 
																-    "> Hints: focus on the **pair_gpu** function and you may as well need to modify few lines in the **main** function."
															
 
																+    "2. **Now, let's start modifying the original code to CuPy code constructs.**\n",
															
 
																+    "> From the top menu, click on File, and Open **nways_serial.py** from the current directory at **Python/source_code/cupy** directory. Remember to SAVE your code after changes, and then run the cell below. \n",
															
 
																+    "> Hints: focus on the **pair_gpu** function and you may need to modify few lines in the **main** function as well."
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -323,12 +322,12 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "To view the profiler report, you would need to [Download the profiler output](../../source_code/serial/serial_cpu_rdf.qdrep) and open it via the GUI. A sample expected profile report should is shown below.\n",
															
 
																+    "To view the profiler report, you need to [download the profiler output](../../source_code/serial/serial_cpu_rdf.qdrep) and open it via the graphical user interface (GUI). A sample expected profile report is shown below:\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/cupy_nsys1.png\"/>\n",
															
 
																     "<img src=\"../images/cupy_nsys3.png\"/>\n",
															
 
																     "\n",
															
 
																-    "From the profile report, we can see that the pair_gpu function now takes miliseconds to run as compared to the serial version which takes more than 3 seconds as shown [here](../serial/rdf_overview.ipynb). \n",
															
 
																+    "From the profile report, we can see that the pair_gpu function now takes milliseconds to run as compared to the serial version which takes more than 3 seconds as shown [here](../serial/rdf_overview.ipynb). \n",
															
 
																     " \n",
															
 
																     "\n",
															
 
																     "---\n",
															
@@ -339,7 +338,7 @@
 
																     "\n",
															
 
																     "## Post-Lab Summary\n",
															
 
																     "\n",
															
 
																-    "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below.\n",
															
 
																+    "If you would like to download this lab for later viewing, we recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page. This will ensure the images are copied as well. You can also execute the following cell block to create a zip-file of the files you've been working on and download it with the link below.\n",
															
 
																     "\n"
															
 
																    ]
															
 
																   },
															
@@ -374,9 +373,9 @@
 
																     "\n",
															
 
																     "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
															
 
																     "\n",
															
 
																-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																+    "[NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																     "\n",
															
 
																-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																+    "**NOTE**: To be able to see the Nsight System profiler output, please download the latest version of the Nsight System from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																     "\n",
															
 
																     "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
															
 
																     "\n",
															
@@ -385,7 +384,7 @@
 
																     "\n",
															
 
																     "## Licensing \n",
															
 
																     "\n",
															
 
																-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																+    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																    ]
															
 
																   }
															
 
																  ],
															
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/numba/numba_RDF.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/numba/numba_RDF.ipynb
@@ -273,9 +273,9 @@
 
																     "\n",
															
 
																     "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
															
 
																     "\n",
															
 
																-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																+    "[NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																     "\n",
															
 
																-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																+    "**NOTE**: To be able to see the Nsight System profiler output, please download the latest version of the Nsight System from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																     "\n",
															
 
																     "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
															
 
																     "\n",
															
@@ -284,7 +284,7 @@
 
																     "\n",
															
 
																     "## Licensing \n",
															
 
																     "\n",
															
 
																-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																+    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																    ]
															
 
																   }
															
 
																  ],
															
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/numba/numba_guide.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/numba/numba_guide.ipynb
@@ -11,7 +11,14 @@
 
																     "#  Numba Lab1: Numba For CUDA GPU\n",
															
 
																     "---\n",
															
 
																     "\n",
															
 
																-    "Before we begin, let's execute the cell below to display information about the CUDA driver and GPUs running on the server by running the `nvidia-smi` command. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell."
															
 
																+    "## Learning Objectives\n",
															
 
																+    "- **The goal of this lab is to:**\n",
															
 
																+    "    -   enable you to quickly start using Numba (beginner to advanced level)\n",
															
 
																+    "    -   teach you to apply the concepts of CUDA GPU programming to HPC field(s); and\n",
															
 
																+    "    -   show you how to achieve computational speedup on GPUs to maximize the throughput of your HPC implementation.\n",
															
 
																+    "\n",
															
 
																+    "\n",
															
 
																+    "Before we begin, let's execute the cell below to display information about the CUDA driver and GPUs running on the server by running the `nvidia-smi` command. To do this, execute the cell block below by clicking on it with your mouse, and pressing Ctrl-Enter, or pressing the play button in the toolbar above. You should see some output returned below the grey cell."
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -27,50 +34,41 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "\n",
															
 
																-    "\n",
															
 
																-    "### Learning Objectives\n",
															
 
																-    "- **The goal of this lab is to:**\n",
															
 
																-    "    -   quickly get you started with Numba from beginner to advanced level\n",
															
 
																-    "    -   teach you application of CUDA GPU programming concept in HPC field(s)\n",
															
 
																-    "    -   show you how to maximize the throughput of your HPC implementation through computational speedup on the GPU.  \n",
															
 
																     "     \n",
															
 
																-    "\n",
															
 
																-    "\n",
															
 
																     "##  Introduction\n",
															
 
																-    "- Numba is a just-in-time (jit) compiler for Python that works best on code that uses NumPy arrays, functions, and loops. Numba has set of decorators that can be specified before user-defined functions to determine how they are compiled.  \n",
															
 
																-    "- A decorated function written in python is compiled into CUDA kernel to speed up execution rate, thus, Numba supports CUDA GPU programming model. \n",
															
 
																-    "- A kernel is written in Numba automatically have direct access to NumPy arrays. This implies a great support for data visiblilty between the host (CPU) and the device (GPU). \n",
															
 
																+    "- Numba is a just-in-time (jit) compiler for Python that works best on code that uses NumPy arrays, functions, and loops. Numba has sets of decorators that can be specified at the top of user-defined functions to determine how they are compiled.  \n",
															
 
																+    "- Numba supports CUDA GPU programming model. Decorated function written in python is compiled into a CUDA kernel to speed up the execution rate. \n",
															
 
																+    "- A kernel written in Numba automatically has direct access to NumPy arrays. This shows great support for data visibility between the host (CPU) and the device (GPU). \n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "###  Definition of Terms\n",
															
 
																     "- The CPU is called a **Host**.  \n",
															
 
																     "- The GPU is called a **Device**.\n",
															
 
																-    "- A GPU function launched by the host and executed on the device is called a **Kernels**.\n",
															
 
																-    "- A GPU function executed on the device which can only be called from the device is called a **Device function**.\n",
															
 
																+    "- A GPU function launched by the host and executed on the device is called a **Kernel**.\n",
															
 
																+    "- A GPU function executed on the device and can only be called from the device is called a **Device function**.\n",
															
 
																     "\n",
															
 
																     "### Note\n",
															
 
																-    "- It is recommended to visit the NVIDIA official documentary web page and read through [CUDA C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide), because most CUDA programming features exposed by Numba map directly to the CUDA C language offered by NVidia. \n",
															
 
																-    "- Numba does not implement of these CUDA features of CUDA:\n",
															
 
																+    "- It is recommended to visit the NVIDIA official documentation web page and read through [CUDA C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide), because most CUDA programming features exposed by Numba map directly to the CUDA C language offered by NVIDIA. \n",
															
 
																+    "- Numba does not implement these CUDA features:\n",
															
 
																     "     - dynamic parallelism\n",
															
 
																     "     - texture memory\n",
															
 
																     "\n",
															
 
																     "## CUDA Kernel\n",
															
 
																     "- In CUDA, written code can be executed by hundreds or thousands of threads at a single run, hence, a solution is modeled after the following thread hierarchy: \n",
															
 
																-    "    - **Grid**: A kernel executed as a collection of blocks \n",
															
 
																-    "    - **Thread Block**: Collection of threads that can communicate via a shared memory. Each thread is executed by a core.\n",
															
 
																+    "    - **Grid**: A kernel executed as a collection of blocks. \n",
															
 
																+    "    - **Thread Block**: Collection of threads that can communicate via shared memory. Each thread is executed by a core.\n",
															
 
																     "    - **Thread**: Single execution units that run kernels on GPU.\n",
															
 
																     "- Numba exposes three kinds of GPU memory: \n",
															
 
																     "    - global device memory  \n",
															
 
																     "    - shared memory \n",
															
 
																-    "    - local memory. \n",
															
 
																+    "    - local memory \n",
															
 
																     "- Memory access should be carefully considered in order to keep bandwidth contention at minimal.\n",
															
 
																     "\n",
															
 
																     " <img src=\"../images/thread_blocks.JPG\"/> <img src=\"../images/memory_architecture.png\"/> \n",
															
 
																     "\n",
															
 
																     "### Kernel Declaration\n",
															
 
																-    "- A kernel function is a GPU function that is called from a CPU code by specifying the number of block threads and threads per block, and can not explicitly return a value except through a passed array. \n",
															
 
																-    "- A kernel can be called multiple times with varying number of blocks per grid and threads per block after its has been compiled once.\n",
															
 
																+    "- A kernel function is a GPU function that is called from a CPU code. It requires specifying the number of blocks and threads per block and cannot explicitly return a value except through a passed array. \n",
															
 
																+    "- A kernel can be called multiple times with varying number of blocks per grid and threads per block after it has been compiled once.\n",
															
 
																     "\n",
															
 
																     "Example:\n",
															
 
																     "\n",
															
@@ -89,12 +87,12 @@
 
																     "```\n",
															
 
																     "\n",
															
 
																     "###### Choosing Block Size\n",
															
 
																-    "- The block size determines how many threads share a given area of shared memory.\n",
															
 
																+    "- The block size determines how many threads share a given area of the shared memory.\n",
															
 
																     "- The block size must be large enough to accommodate all computation units. See more details [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/).\n",
															
 
																     "\n",
															
 
																     "### Thread Positioning \n",
															
 
																-    "- When running a kernel, the kernel function’s code is executed by every thread once. Hence is it important to uniquely identify distinct threads.\n",
															
 
																-    "- The default way to determine a thread position in a grid and block is to manually compute the corresponding array position:\n",
															
 
																+    "- When running a kernel, the kernel function’s code is executed by every thread once. Therefore, it is important to uniquely identify distinct threads.\n",
															
 
																+    "- The default way to determine a thread position in a grid and block is to manually compute the corresponding array positions:\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/thread_position.png\"/>\n",
															
@@ -154,21 +152,21 @@
 
																     "> - N is the size of the array and the number of threads in a single block is 128.\n",
															
 
																     "> - The **cuda.jit()** decorator indicates that the function (arrayAdd) below is a device kernel and should run parallel. The **tid** is the estimate of a unique index for each thread in the device memory grid: \n",
															
 
																     ">> **tid = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x**.\n",
															
 
																-    "> - **array_A** and **array_B** are input data, while **array_out** is output array and is already preload with zeros.\n",
															
 
																-    "> - The statement **blockpergrid  = N + (threadsperblock - 1) // threadsperblock** Computes the size of block per grid. This line of code is commonly use as the default formular to estimate number of blocks per grid in several GPU programming documentations.\n",
															
 
																-    "> - **arrayAdd[blockpergrid, threadsperblock](array_A, array_B, array_out)** indicate a call to a kernel function **addAdd** having the number of blocks per grid and number of threads per block in square bracket, while kernel arguments are in round brackets.\n",
															
 
																+    "> - **array_A** and **array_B** are input data, while **array_out** is the output array and is already preload with zeros.\n",
															
 
																+    "> - The statement **blockpergrid  = N + (threadsperblock - 1) // threadsperblock** computes the size of block per grid. This line of code is commonly use as the default formular to estimate the number of blocks per grid in GPU programming documentations.\n",
															
 
																+    "> - **arrayAdd[blockpergrid, threadsperblock](array_A, array_B, array_out)** indicate a call to a kernel function **arrayAdd** having the number of blocks per grid and number of threads per block in a square bracket, while kernel arguments are in a round bracket.\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																-    "###  Matrix multiplication on 2D Array \n",
															
 
																+    "###  Matrix Multiplication on 2D Array \n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/2d_array.png\"/>\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/2d_col_mult.png\"/>\n",
															
 
																     "\n",
															
 
																     "> **Note**\n",
															
 
																-    "> - **Approach 2** would not be possible if the matrix size exceed the maximum number of threads per block on the device, while **Approach 1** would continue to execute. Most latest GPUs have maximum of 1024 threads per thread block. \n",
															
 
																+    "> - **Approach 2** would not be possible if the matrix size exceeds the maximum number of threads per block on the device, while **Approach 1** would continue to execute. The latest GPUs have maximum of 1024 threads per thread block. \n",
															
 
																     "\n",
															
 
																     "### Example 2:  Matrix multiplication "
															
 
																    ]
															
@@ -215,7 +213,7 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "### Exaample 3: A 225 × 225 Matrix Multiplication"
															
 
																+    "### Example 3: A 225 × 225 Matrix Multiplication"
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -251,11 +249,11 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "### Thread reuse \n",
															
 
																+    "### Thread Reuse \n",
															
 
																     "\n",
															
 
																-    "- It is possible to specify a few number of threads for a data size such that threads are reused to complete the computation of the entire data. This is one of the approach used when a data to be computed is larger than the maximum number of threads available in a device memory. \n",
															
 
																+    "- It is possible to specify a few numbers of threads for a data size such that threads are reused to complete the computation of the entire data. This is one of the approaches used when a data to be computed is larger than the maximum number of threads available in a device memory. \n",
															
 
																     "- This statement is used in a while loop: ***tid += cuda.blockDim.x * cuda.gridDim.x***\n",
															
 
																-    "- An example is given below to illustrates thread reuse. In the example, small number of thread is specified on purpose in order to show the possibility of this approach. \n",
															
 
																+    "- An example is given below to illustrate thread reuse. In the example, a small number of threads is specified on purpose in order to show the possibility of this approach. \n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "#### Example 4: "
															
@@ -295,7 +293,7 @@
 
																    "metadata": {},
															
 
																    "source": [
															
 
																     "> **Note**\n",
															
 
																-    "> - The task in **example 4** is the same as in **example 1** but with limited number of threads specified, howbeit, the same result was achieved. \n",
															
 
																+    "> - The task in **example 4** is the same as in **example 1** but with limited number of threads specified, however, the same result was achieved. \n",
															
 
																     "> - Note that this approach may delegate more threads than required. In the code above, an excess of 1 block of threads may be delegated.\n"
															
 
																    ]
															
 
																   },
															
@@ -306,7 +304,7 @@
 
																     "## Memory Management\n",
															
 
																     "\n",
															
 
																     "### Data Transfer \n",
															
 
																-    "- When a kernel is excuted, Numba automatically transfer NumPy arrays to the device and vice versa.\n",
															
 
																+    "- When a kernel is executed, Numba automatically transfers NumPy arrays to the device and vice versa.\n",
															
 
																     "- In order to avoid the unnecessary transfer for read-only arrays, the following APIs can be used to manually control the transfer.\n",
															
 
																     "\n",
															
 
																     "##### 1.  Copy host to device\n",
															
@@ -334,7 +332,7 @@
 
																     "h_C = d_C.copy_to_host()\n",
															
 
																     "h_C = d_C.copy_to_host(stream=stream)\n",
															
 
																     "```\n",
															
 
																-    "### Example 5:  data movement "
															
 
																+    "### Example 5:  Data Movement "
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -375,8 +373,8 @@
 
																    "source": [
															
 
																     "## Atomic Operation\n",
															
 
																     "\n",
															
 
																-    "- Atomic operation is required in a situation where multiple threads attempt to modify a common portion of the memory. \n",
															
 
																-    "- Typical example includes: simultaneous withdrawal from a bank account through ATM machine or large number of threads modfying a particular index of an array based on certain condition(s)\n",
															
 
																+    "- Atomic operation is required when multiple threads attempt to modify a common portion of the memory. \n",
															
 
																+    "- A typical example includes simultaneous withdrawal from a bank account through ATM machine or a large number of threads modifying a particular index of an array based on certain condition(s).\n",
															
 
																     "- List of presently implemented atomic operations supported by Numba are:\n",
															
 
																     "> **import numba.cuda as cuda**\n",
															
 
																     "> - cuda.atomic.add(array, index, value)\n",
															
@@ -436,7 +434,7 @@
 
																     "### 7. CUDA Ufuncs\n",
															
 
																     "\n",
															
 
																     "- The CUDA ufunc supports passing intra-device arrays to reduce traffic over the PCI-express bus. \n",
															
 
																-    "- It also support asynchronous mode by using stream keyword.\n",
															
 
																+    "- It also supports asynchronous mode by using stream keyword.\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/ufunc.png\"/>"
															
 
																    ]
															
@@ -447,6 +445,9 @@
 
																    "metadata": {},
															
 
																    "outputs": [],
															
 
																    "source": [
															
 
																+    "# example: c = (a - b) * (a + b)\n",
															
 
																+    "# size of each array(A, B, C) is N = 10000\n",
															
 
																+    "\n",
															
 
																     "from numba import vectorize\n",
															
 
																     "import numba.cuda as cuda\n",
															
 
																     "import numpy as np\n",
															
@@ -467,9 +468,10 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "#### Device function\n",
															
 
																+    "### Device Function\n",
															
 
																     "\n",
															
 
																-    "- The CUDA device functions can only be invoked from within the device and can return a value like normal functions. The device function is usually placed before the CUDA ufunc kernel otherwise a call to the device function may not be visible inside the ufunc kernel."
															
 
																+    "- The CUDA device functions can only be invoked from within the device and can return a value like normal functions. The device function is usually placed before the CUDA ufunc kernel otherwise a call to the device function may not be visible inside the ufunc kernel.\n",
															
 
																+    "- The attributes <i>device=True</i> and <i>inline=true</i> indicate that <i>\"device_ufunc\"</i> is a device function."
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -478,6 +480,8 @@
 
																    "metadata": {},
															
 
																    "outputs": [],
															
 
																    "source": [
															
 
																+    "#example: c = sqrt((a - b) * (a + b))\n",
															
 
																+    "\n",
															
 
																     "from numba import vectorize\n",
															
 
																     "import numba.cuda as cuda\n",
															
 
																     "import numpy as np\n",
															
@@ -506,7 +510,7 @@
 
																     "\n",
															
 
																     "## Lab Task\n",
															
 
																     "\n",
															
 
																-    "In this section, you are expected to click on the **Serial code Lab Assignment** link and proceed to Lab 2. In this lab you will find three python serial code functions. You are required to revise the **pair_gpu** function and make it run on the GPU, and likewise do a few modifications on the **main** function.\n",
															
 
																+    "In this section, you are expected to click on the **Serial Code Lab Assignment** link and proceed to Lab 2. In this lab you will find three python serial code functions. You are required to revise the **pair_gpu** function to run on the GPU, and likewise do a few modifications within the **main** function.\n",
															
 
																     "\n",
															
 
																     "## <div style=\"text-align:center; color:#FF0000; border:3px solid red;height:80px;\"> <b><br/> [Serial Code Lab Assignment](serial_RDF.ipynb) </b> </div>\n",
															
 
																     "\n",
															
@@ -514,7 +518,7 @@
 
																     "\n",
															
 
																     "## Post-Lab Summary\n",
															
 
																     "\n",
															
 
																-    "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below.\n",
															
 
																+    "If you would like to download this lab for later viewing, we recommend you go to your browser's File menu (not the Jupyter notebook file menu) and save the complete web page. This will ensure the images are copied as well. You can also execute the following cell block to create a zip-file of the files you've been working on and download it with the link below.\n",
															
 
																     "\n"
															
 
																    ]
															
 
																   },
															
@@ -550,9 +554,9 @@
 
																     "\n",
															
 
																     "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
															
 
																     "\n",
															
 
																-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																+    "[NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																     "\n",
															
 
																-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																+    "**NOTE**: To be able to see the Nsight System profiler output, please download the latest version Nsight System from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																     "\n",
															
 
																     "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
															
 
																     "\n",
															
@@ -570,7 +574,7 @@
 
																     "\n",
															
 
																     "## Licensing \n",
															
 
																     "\n",
															
 
																-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																+    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0  International (CC BY 4.0)."
															
 
																    ]
															
 
																   }
															
 
																  ],
															
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/numba/serial_RDF.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/numba/serial_RDF.ipynb
@@ -15,14 +15,13 @@
 
																     "\n",
															
 
																     "## A Recap on RDF\n",
															
 
																     "\n",
															
 
																-    "- The radial distribution function (RDF) denoted in equations by g(r) defines the probability of finding a particle at a distance r from another tagged particle. \n",
															
 
																-    "- The RDF is strongly dependent on the type of matter so will vary greatly for solids, gases and liquids.\n",
															
 
																-    "- It is observed the code complexity of the algorithm in 𝑁^2. Let us get into details of the accelerated code analysis. \n",
															
 
																+    "- The radial distribution function (RDF) denoted as g(r) defines the probability of finding a particle at a distance r from another tagged particle. The RDF is strongly dependent on the type of matter so will vary greatly for solids, gases and liquids. You can read more [here](https://en.wikibooks.org/wiki/Molecular_Simulation/Radial_Distribution_Functions).\n",
															
 
																+    "- The code complexity of the algorithm is $N^{2}$. \n",
															
 
																     "- The input data for the serial code is fetched from a DCD binary trajectory file.\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																-    "### The Serial\n",
															
 
																-    "- The Cell below consist of two functions namely **dcdreadhead** and **dcdreadframe**\n",
															
 
																+    "### The Serial Code\n",
															
 
																+    "- The cell below consists of two functions, namely **dcdreadhead** and **dcdreadframe**\n",
															
 
																     "- The **dcdreadhead** function computes the total number of frames and atoms from the DCDFile **(input/alk.traj.dcd)**, while the **dcdreadframe** function reads 10 frames and 6720 atoms (note: each frame contains 6720 atoms) using the MDAnalysis library. \n",
															
 
																     "- Both functions run on the Host (CPU) and are being called from the function **main()**.\n",
															
 
																     "\n",
															
@@ -79,9 +78,9 @@
 
																    "source": [
															
 
																     "##  pair_gpu function\n",
															
 
																     "\n",
															
 
																-    "- The pair_gpu is the function where the main task of the RDF serial implementation is being executed. The function computes differences in xyz DCD frames. \n",
															
 
																-    "- The essence of njit(just-in-time) decorator is to get pair_gpu function to compile under no python mode, and this is really important for good performance. \n",
															
 
																-    "- The decorator **@njit** or **@jit(nopython=True)** ensures that an exception is raised when compilation fails as a way of to alert that a bug is found within the decorated function. You can read more [here](https://numba.pydata.org/numba-doc/latest/user/performance-tips.html).\n",
															
 
																+    "- The pair_gpu is the function where the main task of the RDF serial implementation is being executed. The function computes differences in xyz DCD frames.\n",
															
 
																+    "- The essence of njit(just-in-time) decorator is to get pair_gpu function to compile under no python mode, and this is important for good performance. \n",
															
 
																+    "- The decorator **@njit** or **@jit(nopython=True)** ensures that an exception is raised when compilation fails as a way to alert the user that a bug is found within the decorated function. You can read more [here](https://numba.pydata.org/numba-doc/latest/user/performance-tips.html).\n",
															
 
																     "\n",
															
 
																     "### <u>Cell 2</u>"
															
 
																    ]
															
@@ -125,13 +124,13 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "#### Brief Analysis on internal task performed within pair_gpu function\n",
															
 
																-    "- The graphic below identifies the various operations executed in the pair_gpu function. This function executes three nested-loops using tricky indexing manipulation within the arrays.\n",
															
 
																+    "#### Brief Analysis on Tasks Performed within pair_gpu function\n",
															
 
																+    "- The graphic below identifies the various operations executed in the pair_gpu function. This function executes three nested loops using tricky indexing manipulation within the arrays.\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/pair_gpu.png\" width=\"80%\"/>\n",
															
 
																     "\n",
															
 
																-    "- The indexing flow for the operation 1 is simulated using the graphic below. Each green box simualate the subtraction operation within the two inner loops (id1 & id2) while the indexes written in blue signifies the outer-most loop (frame) which iterates 10 times. \n",
															
 
																+    "- The indexing flow for the operation 1 is simulated using the graphic below. Each green box simulates the subtraction operation within the two inner loops (id1 & id2) while the indexes written in blue signifies the outer-most loop (frame) which iterates 10 times. \n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/pair_gpu_analysis.png\" width=\"80%\"/>\n",
															
 
																     "\n",
															
@@ -140,7 +139,7 @@
 
																     "\n",
															
 
																     "\n",
															
 
																     "### The Main Function\n",
															
 
																-    "- This is the entry point of the program where every other functions including the **pair_gpu** function are called. The output of the main function is written into two files. An image version of the output files (\"**cupy_RDF.dat**\" & \"**cupy_Pair_entropy.dat**\") are displayed below the code cell.\n",
															
 
																+    "- This is the entry point of the program where every other function including the **pair_gpu** function are called. The output of the main function is written into two files. An image version of the output files (\"**cupy_RDF.dat**\" & \"**cupy_Pair_entropy.dat**\") are displayed below the code cell.\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																     "### <u>Cell 3</u>"
															
@@ -277,15 +276,15 @@
 
																     "\n",
															
 
																     "## Lab Task \n",
															
 
																     "\n",
															
 
																-    "1. 1. **Run the serial code from cell 1, 2, & 3**\n",
															
 
																+    "1. 1. **Run the serial code from cell 1, 2, & 3**.\n",
															
 
																     "    - Remove the **\"#\"** behind the **main()** before running the cell 3:\n",
															
 
																     "    ```python\n",
															
 
																     "       if __name__ == \"__main__\":\n",
															
 
																     "                main()\n",
															
 
																     "    ```\n",
															
 
																     "2. **Now, lets start modifying the original code to Numba code constructs.**\n",
															
 
																-    "> From the top menu, click on File, and Open **nways_serial.py** from the current directory at **Python/source_code/numba** directory. Remember to SAVE your code after changes, before running below cells. \n",
															
 
																-    "> Hints: focus on the **pair_gpu** function and you may as well need to modify few lines in the **main** function."
															
 
																+    "> From the top menu, click on File, and Open **nways_serial.py** from the current directory at **Python/source_code/numba** directory. Remember to SAVE your code after changes, and then run the cell below. \n",
															
 
																+    "> Hints: focus on the **pair_gpu** function and you may need to modify few lines in the **main** function as well."
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -324,12 +323,12 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "To view the profiler report, you would need to [Download the profiler output](../../source_code/serial/serial_cpu_rdf.qdrep) and open it via the GUI. A sample expected profile report is given below.\n",
															
 
																+    "To view the profiler report, you need to [download the profiler output](../../source_code/serial/serial_cpu_rdf.qdrep) and open it via the graphical user interface (GUI). A sample expected profile report is given below:\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/numba_nsys1.png\"/>\n",
															
 
																     "<img src=\"../images/numba_nsys2.png\"/>\n",
															
 
																     "\n",
															
 
																-    "From the profile report, we can see that the pair_gpu function now takes miliseconds to run as compared to the serial version which takes more than 3 seconds as shown [here](../serial/rdf_overview.ipynb). \n",
															
 
																+    "From the profile report, we can see that the pair_gpu function now takes milliseconds to run as compared to the serial version which takes more than 3 seconds as shown [here](../serial/rdf_overview.ipynb). \n",
															
 
																     "\n",
															
 
																     "---\n",
															
 
																     "### [View](../../source_code/numba/numba_rdf.py) or [Run](../../jupyter_notebook/numba/numba_RDF.ipynb)  Solution \n",
															
@@ -337,7 +336,7 @@
 
																     "\n",
															
 
																     "## Post-Lab Summary\n",
															
 
																     "\n",
															
 
																-    "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below.\n"
															
 
																+    "If you would like to download this lab for later viewing, we recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page. This will ensure the images are copied as well. You can also execute the following cell block to create a zip-file of the files you've been working on and download it with the link below.\n"
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -372,9 +371,9 @@
 
																     "\n",
															
 
																     "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
															
 
																     "\n",
															
 
																-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																+    "[NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)\n",
															
 
																     "\n",
															
 
																-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																+    "**NOTE**: To be able to see the Nsight System profiler output, please download the latest version of the Nsight System from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																     "\n",
															
 
																     "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
															
 
																     "\n",
															
@@ -383,7 +382,7 @@
 
																     "\n",
															
 
																     "## Licensing \n",
															
 
																     "\n",
															
 
																-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																+    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
															
 
																    ]
															
 
																   }
															
 
																  ],
															
--- a/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/serial/rdf_overview.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Python/jupyter_notebook/serial/rdf_overview.ipynb
@@ -5,21 +5,21 @@
 
																    "metadata": {},
															
 
																    "source": [
															
 
																     "## RDF\n",
															
 
																-    "The radial distribution function (RDF) denoted in equations by g(r) defines the probability of finding a particle at a distance r from another tagged particle. The RDF is strongly dependent on the type of matter so will vary greatly for solids, gases and liquids.\n",
															
 
																+    "The radial distribution function (RDF) denoted as g(r) defines the probability of finding a particle at a distance r from another tagged particle. The RDF is strongly dependent on the type of matter so will vary greatly for solids, gases and liquids. You can read more [here](https://en.wikibooks.org/wiki/Molecular_Simulation/Radial_Distribution_Functions).\n",
															
 
																     "<img src=\"../images/rdf.png\" width=\"50%\" height=\"50%\">\n",
															
 
																-    " The radial distribution function (RDF) denoted in equations by g(r) defines the probability of finding a particle at a distance r from another tagged particle. \n"
															
 
																+    " \n"
															
 
																    ]
															
 
																   },
															
 
																   {
															
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "As you might have observed the code complexity of the algorithm in $N^{2}$ . Let us get into details of the sequential code. **Understand and analyze** the code present at:\n",
															
 
																+    "The code complexity of the algorithm is $N^{2}$ . Let us get into details of the serial code by clicking on the link below:\n",
															
 
																     "\n",
															
 
																     "[RDF Serial Code](../../source_code/serial/nways_serial.py)\n",
															
 
																     "\n",
															
 
																     "\n",
															
 
																-    "Open the downloaded file for inspection."
															
 
																+    "Open the downloaded file, analyze and understand the code if possible, and run the cell below."
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -37,10 +37,10 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "We plan to follow the typical optimization cycle that every code needs to go through\n",
															
 
																+    "We plan to follow a typical optimization cycle that every code need to go through\n",
															
 
																     "<img src=\"../images/workflow.png\" width=\"70%\" height=\"70%\">\n",
															
 
																     "\n",
															
 
																-    "In order analyze the application we we will make use of profiler \"nsys\" and add \"nvtx\" marking into the code to get more information out of the serial code. Before running the below cells, let's first start by divining into the profiler lab to learn more about the tools. Using Profiler gives us the hotspots and helps to understand which function is important to be made parallel.\n",
															
 
																+    "In order to analyze the application, we will make use of the NVIDIA Nsight System profiler \"nsys\" and add NVIDIA Tools Extension SDK  for annotation \"nvtx\" marking within the code to get more information out of the serial code. Before running the cell below, let's first start by diving into the profiler lab to learn more about the tools. Using profiler identifies the hotspots and helps us understand which function(s) are most important to parallelize.\n",
															
 
																     "\n",
															
 
																     "-----\n",
															
 
																     "\n",
															
@@ -48,7 +48,7 @@
 
																     "\n",
															
 
																     "-----\n",
															
 
																     "\n",
															
 
																-    "Now, that we are familiar with the Nsight Profiler and know how to [NVTX](../../../../../profiler/English/jupyter_notebook/profiling-c.ipynb#nvtx), let's profile the serial code and checkout the output."
															
 
																+    "Now, that we are familiar with the Nsight Profiler and know how to use [NVTX](../../../../../profiler/English/jupyter_notebook/profiling-c.ipynb#nvtx), let's profile the serial code and evaluate the output."
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -64,18 +64,18 @@
 
																    "cell_type": "markdown",
															
 
																    "metadata": {},
															
 
																    "source": [
															
 
																-    "Once you run the above cell, you should see the following in the terminal.\n",
															
 
																+    "Once you run the above cell, you should see the following in the terminal:\n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/serial_cpu_rdf1.png\" width=\"700px\" height=\"600px\"/>\n",
															
 
																     "<img src=\"../images/serial_cpu_rdf2.png\" width=\"700px\" height=\"400px\"/>\n",
															
 
																     "\n",
															
 
																-    "To view the profiler report, you would need to [Download the profiler output](../../source_code/serial/serial_cpu_rdf.qdrep) and open it via the GUI. For more information on how to open the report via the GUI, please checkout the section on [How to view the report](../../../../../profiler/English/jupyter_notebook/profiling-c.ipynb#gui-report). \n",
															
 
																+    "To view the profiler report, you need to [download the profiler output](../../source_code/serial/serial_cpu_rdf.qdrep) and open it via the graphical user interface (GUI). For more information on how to open the report via the GUI, please check out the section on [how to view the report](../../../../../profiler/English/jupyter_notebook/profiling-c.ipynb#gui-report). \n",
															
 
																     "\n",
															
 
																-    "From the timeline view, right click on the nvtx row and click the \"show in events view\". Now you can see the nvtx statistic at the bottom of the window which shows the duration of each range. In the following labs, we will look in to the profiler report in more detail. \n",
															
 
																+    "From the timeline view, right click on the nvtx row and click the \"show in events view\". You can see the nvtx statistic at the bottom of the window which shows the duration of each range. In the following labs, we will explore the profiler report in more detail. \n",
															
 
																     "\n",
															
 
																     "<img src=\"../images/serial_profile.png\" width=\"100%\" height=\"100%\"/>\n",
															
 
																     "\n",
															
 
																-    "The obvious next step is to make **Pair Calculation** algorithm parallel using different approaches to GPU Programming. Please follow the below link and choose one of the approaches to parallelise th serial code.\n",
															
 
																+    "The next step is to make the **Pair Calculation** algorithm parallel using existing approaches within GPU Programming. Please follow the link below and choose one approach to parallelize the serial code.\n",
															
 
																     "\n",
															
 
																     "-----\n",
															
 
																     "\n",
															
@@ -94,7 +94,7 @@
 
																     "\n",
															
 
																     "[Profiling timelines with NVTX](https://devblogs.nvidia.com/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx/)\n",
															
 
																     "\n",
															
 
																-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																+    "**NOTE**: To be able to see the Nsight System profiler output, please download the latest version of NVIDIA Nsight System from [here](https://developer.nvidia.com/nsight-systems).\n",
															
 
																     "\n",
															
 
																     "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
															
 
																     "\n",
															
@@ -102,8 +102,15 @@
 
																     "\n",
															
 
																     "## Licensing \n",
															
 
																     "\n",
															
 
																-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
															
 
																+    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
															
 
																    ]
															
 
																+  },
															
 
																+  {
															
 
																+   "cell_type": "code",
															
 
																+   "execution_count": null,
															
 
																+   "metadata": {},
															
 
																+   "outputs": [],
															
 
																+   "source": []
															
 
																   }
															
 
																  ],
															
 
																  "metadata": {
															
--- a/hpc/nways/nways_labs/nways_MD/English/nways_MD_start_python.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/nways_MD_start_python.ipynb
--- a/hpc/nways/nways_labs/nways_MD/README.md
+++ b/hpc/nways/nways_labs/nways_MD/README.md
@@ -31,7 +31,7 @@ For example :
 
																 `sudo docker build -f Dockerfile_python -t myimage:1.0 .`
															
 
																-For C, Fortran, and Python, the code labs have been written using Jupyter notebooks and a Dockerfile has been built to simplify deployment. In order to serve the docker instance for a student, it is necessary to expose port 8000 from the container, for instance, the following command would expose port 8000 inside the container as port 8000 on the lab machine:
															
 
																+For C, Fortran, and Python, the code labs have been written using Jupyter notebooks and a Dockerfile has been built to simplify deployment. In order to serve the docker instance for a student, it is necessary to expose port 8000 from the container. For example, the following command would expose port 8000 inside the container as port 8000 on the lab machine:
															
 
																 `sudo docker run --rm -it --gpus=all -p 8888:8888 myimage:1.0`
															
--- a/hpc/nways/nways_labs/nways_start.ipynb
+++ b/hpc/nways/nways_labs/nways_start.ipynb
@@ -6,14 +6,14 @@
 
																    "source": [
															
 
																     "## N Ways to GPU Programming\n",
															
 
																     "\n",
															
 
																-    "## Learning objectives\n",
															
 
																-    "With the release of CUDA in 2007, different approaches to programming GPUs have evolved. Each approach has its own advantages and disadvantages. By the end of this bootcamp session, students will have a broader perspective on GPU programming approaches to help them select a programming model that better fits their applications' needs and constraints. The bootcamp will teach how to accelerate a real world scientific application  using the following methods:\n",
															
 
																+    "## Learning Objectives\n",
															
 
																+    "With the release of NVIDIA CUDA in 2007, different approaches to GPU programming have evolved. Each approach has its own advantages and disadvantages. By the end of this bootcamp session, participants will have a broader perspective on GPU programming approaches to help them select a programming model that better fits their application's needs and constraints. The bootcamp will teach how to accelerate a real-world scientific application using the following methods:\n",
															
 
																     "* Standard: C++ stdpar, Fortran Do-Concurrent\n",
															
 
																     "* Directives: OpenACC, OpenMP\n",
															
 
																     "* Frameworks: Kokkos\n",
															
 
																     "* Programming Language Extension: CUDA C, CUDA Fortran, Python CuPy, Python Numba\n",
															
 
																     "\n",
															
 
																-    "Let's start with testing the CUDA Driver and GPU you are running the code on in this lab:"
															
 
																+    "Let's start by testing the CUDA Driver and GPU you are running the code on in this lab:"
															
 
																    ]
															
 
																   },
															
 
																   {
															
@@ -31,7 +31,7 @@
 
																    "source": [
															
 
																     "### Tutorial Outline\n",
															
 
																     "\n",
															
 
																-    "During this lab, we will be working on porting mini applications in Molecular Simulation (MD) domain to GPUs. You can choose to work with either of this application. Please click on one of the below links to start N Ways to GPU Programming in **MD** for:\n",
															
 
																+    "During this lab, we will be working on porting mini-applications in Molecular Simulation (MD) domain to GPUs. You can choose to work with either version of this application. Please click on one of the links below to start N Ways to GPU Programming in **MD** for:\n",
															
 
																     "\n",
															
 
																     "- [ C and Fortran ](nways_MD/English/nways_MD_start.ipynb) domain\n",
															
 
																     "- [Python ](nways_MD/English/nways_MD_start_python.ipynb) domain\n"
															
@@ -42,13 +42,13 @@
 
																    "metadata": {},
															
 
																    "source": [
															
 
																     "### Tutorial Duration\n",
															
 
																-    "The lab material will be presented in a 8hr session. Link to material is available for download at the end of the lab.\n",
															
 
																+    "The lab material will be presented in an 8-hour session. A Link to the material is available for download at the end of the lab.\n",
															
 
																     "\n",
															
 
																     "### Content Level\n",
															
 
																     "Beginner, Intermediate\n",
															
 
																     "\n",
															
 
																     "### Target Audience and Prerequisites\n",
															
 
																-    "The target audience for this lab is researchers/graduate students and developers who are interested in learning about programming various ways to programming GPUs to accelerate their scientific applications.\n",
															
 
																+    "The target audience for this lab are researchers/graduate students and developers who are interested in learning about various ways of GPU programming to accelerate their scientific applications.\n",
															
 
																     "\n",
															
 
																     "Basic experience with C/C++ or Python or Fortran programming is needed. No GPU programming knowledge is required. \n",
															
 
																     "\n",
															
@@ -56,7 +56,7 @@
 
																     "\n",
															
 
																     "## Licensing \n",
															
 
																     "\n",
															
 
																-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
															
 
																+    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
															
 
																    ]
															
 
																   }
															
 
																  ],