5 years ago · a6d1e17bc6
--- a/hpc/miniprofiler/Dockerfile
+++ b/hpc/miniprofiler/Dockerfile
@@ -1,7 +1,7 @@
 
				 # Copyright (c) 2020 NVIDIA Corporation.  All rights reserved. 
			
 
				 
			
 
				-# To build: $ sudo docker build -t miniprofiler:latest .
			
 
				-# To run: $ sudo docker run --rm -it --gpus=all -p 8888:8888 miniprofiler:latest
			
 
				+# To build: $ sudo docker build -t nvidia_nsight_profiling_openacc:latest .
			
 
				+# To run: $ sudo docker run --rm -it --gpus=all -p 8888:8888 nvidia_nsight_profiling_openacc:latest
			
 
				 # Finally, open http://127.0.0.1:8888/
			
 
				 
			
 
				 FROM nvcr.io/hpc/pgi-compilers:ce
			
--- a/misc/jupyter_lab_template/appName/English/.ipynb_checkpoints/appName_start-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/.ipynb_checkpoints/appName_start-checkpoint.ipynb
@@ -1,78 +0,0 @@
 
				-{
			
 
				- "cells": [
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "## AppName Tutorial\n",
			
 
				-    "\n",
			
 
				-    "### Learning objectives\n",
			
 
				-    "What are the learning objectives? What do you want to teach the attendees? Will they apply the knowledge to an application? What is so significant about this application? How would this session benefit them? \n",
			
 
				-    "\n",
			
 
				-    "Use keywords to show what this tutorial is about without the need to start it. It should give the attendee an idea of what they will learn and how would this benefit them.\n",
			
 
				-    "\n",
			
 
				-    "Is there an optional exercise at the end? Who is the audience? \n",
			
 
				-    "\n",
			
 
				-    "- Learn ... {fill it with more detail)\n",
			
 
				-    "- Learn ... \n",
			
 
				-    "\n",
			
 
				-    "What will the attendees achieve at the end?\n",
			
 
				-    "\n",
			
 
				-    "### Tutorial Outline\n",
			
 
				-    "In this section, breakdown each lab and provide detail of what to expect to learn in each lab.\n",
			
 
				-    "Note: Avoid adding too much information to one jupyter lab. Best practice here is to have one lab/learning objective in one notebook\n",
			
 
				-    "\n",
			
 
				-    "- Introduction ([C](C/jupyter_notebook/appName_c_lab1.ipynb) , [xx](Fortran/jupyter_notebook/appName_fortran_lab1.ipynb))\n",
			
 
				-    "    - Overview of ...\n",
			
 
				-    "    - How to use ...\n",
			
 
				-    "- Lab 1 (OPTIONAL: links to notebooks or any necessary part)\n",
			
 
				-    "    - ...\n",
			
 
				-    "    - ...\n",
			
 
				-    "    \n",
			
 
				-    "\n",
			
 
				-    "### Tutorial Duration\n",
			
 
				-    "The lab material will be presented in a xxx session. Link to material is available for download at the end of the lab.\n",
			
 
				-    "\n",
			
 
				-    "### Content Level\n",
			
 
				-    "Options could be Beginner, Intermediate, advanced\n",
			
 
				-    "\n",
			
 
				-    "### Target Audience and Prerequisites\n",
			
 
				-    "Who is the target audiencet for this lab? What should they be interested? What do you expect the attendees to know in advance?\n",
			
 
				-    "\n",
			
 
				-    "Examples: Do they need to know about C? C++ ? GPU architecture? Any programming experience? Does it have to be advance? or basic undrestanding of it would be enough?\n",
			
 
				-    "\n",
			
 
				-    "\n",
			
 
				-    "### Start Here\n",
			
 
				-    "You can choose between a [C-based code](C/jupyter_notebook/appName_c_lab1.ipynb) and a [xx-based code](Fortran/jupyter_notebook/appName_fortran_lab1.ipynb).\n",
			
 
				-    "\n",
			
 
				-    "--- \n",
			
 
				-    "\n",
			
 
				-    "## Licensing \n",
			
 
				-    "\n",
			
 
				-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). \n"
			
 
				-   ]
			
 
				-  }
			
 
				- ],
			
 
				- "metadata": {
			
 
				-  "anaconda-cloud": {},
			
 
				-  "kernelspec": {
			
 
				-   "display_name": "Python 3",
			
 
				-   "language": "python",
			
 
				-   "name": "python3"
			
 
				-  },
			
 
				-  "language_info": {
			
 
				-   "codemirror_mode": {
			
 
				-    "name": "ipython",
			
 
				-    "version": 3
			
 
				-   },
			
 
				-   "file_extension": ".py",
			
 
				-   "mimetype": "text/x-python",
			
 
				-   "name": "python",
			
 
				-   "nbconvert_exporter": "python",
			
 
				-   "pygments_lexer": "ipython3",
			
 
				-   "version": "3.7.4"
			
 
				-  }
			
 
				- },
			
 
				- "nbformat": 4,
			
 
				- "nbformat_minor": 1
			
 
				-}
			
--- a/misc/jupyter_lab_template/appName/English/.ipynb_checkpoints/profiling_start-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/.ipynb_checkpoints/profiling_start-checkpoint.ipynb
@@ -1,101 +0,0 @@
 
				-{
			
 
				- "cells": [
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "## Profiling Tutorial\n",
			
 
				-    "\n",
			
 
				-    "### Learning objectives\n",
			
 
				-    "Learn how to profile your application with NVIDIA Nsight Systems and NVTX API calls to find performance limiters and bottlenecks and apply incremental parallelization strategies using OpenACC programming model. In this lab, you will:\n",
			
 
				-    "\n",
			
 
				-    "- Understand what a profiler is and which NVIDIA Nsight tool to choose in order to profile your application\n",
			
 
				-    "- Profile a sequential weather modeling application (integrated with NVIDIA Tools Extension (NVTX) APIs) with NVIDIA Nsight Systems to capture and trace CPU events and time ranges\n",
			
 
				-    "- Understand how to use NVIDIA Nsight Systems profiler’s report to detect hotspots and apply OpenACC compute constructs to the serial application to parallelise it on the GPU\n",
			
 
				-    "- Learn how to use Nsight Systems to identify issues such as underutilized GPU device and unnecessary data movements in the application and to apply optimization strategies steps by steps to expose more parallelism and utilize computer’s CPU and GPU\n",
			
 
				-    "- Learn how to use occupancy to address performance limitations\n",
			
 
				-    "- Learn to follow cyclical process (analyze, parallelize, optimize) to help you identify the portions of the code that would benefit from GPU acceleration and apply parallelisation strategies and optimization techniques to see additional speedups and improve performance\n",
			
 
				-    "\n",
			
 
				-    "In this lab, we will be optimizing the serial Weather Simulation application written in both C and Fortran programming language. You are welcome to have a look at the mini weather lab and follow the steps to familiarize yourself with the application. \n",
			
 
				-    "\n",
			
 
				-    "An optional exercise on how to use Nsight Compute profiler is available for advanced users. This exercise covers basics on how and when to use the Nsight Compute profiler to get you started. Steps to unravel performance limiters will be presented through a simple exercise.\n",
			
 
				-    "\n",
			
 
				-    "\n",
			
 
				-    "### Tutorial Outline\n",
			
 
				-    "- Introduction ([C](C/jupyter_notebook/profiling-c.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran.ipynb))\n",
			
 
				-    "    - Overview of Nsight profiler tools\n",
			
 
				-    "    - How to use NVTX APIs\n",
			
 
				-    "    - Overview of [Mini Weather application](C/jupyter_notebook/miniweather.ipynb)\n",
			
 
				-    "    - Optimization Steps to parallel programming with OpneACC\n",
			
 
				-    "- Lab 1 ([C](C/jupyter_notebook/profiling-c-lab1.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab1.ipynb))\n",
			
 
				-    "    - How to compile a serial application with PGI compiler\n",
			
 
				-    "    - How to profile a serial application with Nsight Systems and NVTX APIs\n",
			
 
				-    "    - How to use profiler's report to find hotspots\n",
			
 
				-    "    - Scaling and Amdahl's law and why it matters\n",
			
 
				-    "- Lab 2 ([C](C/jupyter_notebook/profiling-c-lab2.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab2.ipynb))\n",
			
 
				-    "    - Parallelise the serial application using OpenACC compute directives\n",
			
 
				-    "    - How to compile a parallel application with PGI compiler\n",
			
 
				-    "    - What does the compiler feedback tell us\n",
			
 
				-    "    - Profile with Nsight Systems\n",
			
 
				-    "    - Finding bottlenecks from Nsight Systems report\n",
			
 
				-    "- Lab 3 ([C](C/jupyter_notebook/profiling-c-lab3.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab3.ipynb))\n",
			
 
				-    "    - How to combine the knowledge from compiler feedback and profiler to optimize the application\n",
			
 
				-    "    - What is occupancy\n",
			
 
				-    "    - Demystifying Gangs, Workers, and Vectors\n",
			
 
				-    "    - Apply collapse clause to optimize the application further\n",
			
 
				-    "- Lab 4 ([C](C/jupyter_notebook/profiling-c-lab4.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab4.ipynb))\n",
			
 
				-    "    - Inspect data movement from the profiler's report\n",
			
 
				-    "    - Data management with OpenACC\n",
			
 
				-    "    - Apply incremental parallelization strategies and use profiler's report for the next step\n",
			
 
				-    "- Lab 5 ([C](C/jupyter_notebook/profiling-c-lab5.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab5.ipynb))\n",
			
 
				-    "    - Overview of Nsight Compute\n",
			
 
				-    "    - When and How to use Nsight Compute\n",
			
 
				-    "    - What does the profiler tell us, where is the bottleneck\n",
			
 
				-    "    - How to use baselines with Nsight Compute\n",
			
 
				-    "    \n",
			
 
				-    "\n",
			
 
				-    "### Tutorial Duration\n",
			
 
				-    "The lab material will be presented in a 2hr session. Link to material is available for download at the end of the lab.\n",
			
 
				-    "\n",
			
 
				-    "### Content Level\n",
			
 
				-    "Beginner, Intermediate\n",
			
 
				-    "\n",
			
 
				-    "### Target Audience and Prerequisites\n",
			
 
				-    "The target audience for this lab is researchers/graduate students and developers who are interested in getting hands on experience with the NVIDIA Nsight System through profiling a real life parallel application using OpenACC programming model and NVTX.\n",
			
 
				-    "\n",
			
 
				-    "While this tutorial does not assume any expertise in CUDA experience, basic knowledge of OpenACC programming (e.g: compute constructs), GPU architecture, and programming experience with C/C++ or Fortran is desirable.\n",
			
 
				-    "\n",
			
 
				-    "### Start Here\n",
			
 
				-    "You can choose between a [C-based code](C/jupyter_notebook/profiling-c.ipynb) and a [Fortran-based code](Fortran/jupyter_notebook/profiling-fortran.ipynb).\n",
			
 
				-    "\n",
			
 
				-    "--- \n",
			
 
				-    "\n",
			
 
				-    "## Licensing \n",
			
 
				-    "\n",
			
 
				-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). \n"
			
 
				-   ]
			
 
				-  }
			
 
				- ],
			
 
				- "metadata": {
			
 
				-  "anaconda-cloud": {},
			
 
				-  "kernelspec": {
			
 
				-   "display_name": "Python 3",
			
 
				-   "language": "python",
			
 
				-   "name": "python3"
			
 
				-  },
			
 
				-  "language_info": {
			
 
				-   "codemirror_mode": {
			
 
				-    "name": "ipython",
			
 
				-    "version": 3
			
 
				-   },
			
 
				-   "file_extension": ".py",
			
 
				-   "mimetype": "text/x-python",
			
 
				-   "name": "python",
			
 
				-   "nbconvert_exporter": "python",
			
 
				-   "pygments_lexer": "ipython3",
			
 
				-   "version": "3.7.4"
			
 
				-  }
			
 
				- },
			
 
				- "nbformat": 4,
			
 
				- "nbformat_minor": 1
			
 
				-}
			
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/appName_c_lab1-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/appName_c_lab1-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/miniweather-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/miniweather-checkpoint.ipynb
@@ -1,114 +0,0 @@
 
				-{
			
 
				- "cells": [
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "# A MINI-WEATHER APPLICATION\n",
			
 
				-    "\n",
			
 
				-    "In this lab we will accelerate a Fluid Simulation in the context of atmosphere and weather simulation.\n",
			
 
				-    "The mini weather code mimics the basic dynamics seen in the atmspheric weather and climate.\n",
			
 
				-    "\n",
			
 
				-    "The figure below demonstrates how a narrow jet of fast and slightly cold wind is injected into a balanced, neutral atmosphere at rest from the left domain near the model.\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/Time.jpg\" width=\"80%\" height=\"80%\">\n",
			
 
				-    "\n",
			
 
				-    "Simulation is a repetitive process from 0 to the desired simulated time, increasing by Δt on every iteration.\n",
			
 
				-    "Each Δt step is practically the same operation. Each simulation is solving a differential equation that represents how the flow of the atmosphere (fluid) changes according to small perturbations. To simplify this solution the code uses dimensional splitting: Each dimension X and Z are treated independently.\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/X_Y.jpg\" width=\"80%\" height=\"80%\">\n",
			
 
				-    "\n",
			
 
				-    "The differential equation has a time derivative that needs integrating, and a simple low-storage Runge-Kutta ODE solver is used to integrate the time derivative. Each time step, the order in which the dimentions are solved is reversed, giving second-order accuracy. \n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/Range-Kutta.jpg\" width=\"70%\" height=\"70%\">\n",
			
 
				-    "\n",
			
 
				-    "### The objective of this exercise is not to dwell into the Maths part of it but to make use of OpenACC to parallelize and improve the performance.\n",
			
 
				-    "\n",
			
 
				-    "The general flow of the code is as shown in diagram below. For each time step the differential equations are solved.\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/Outer_Loop.jpg\" width=\"70%\" height=\"70%\">\n",
			
 
				-    "\n",
			
 
				-    "\n",
			
 
				-    "```cpp\n",
			
 
				-    "while (etime < sim_time) {\n",
			
 
				-    "    //If the time step leads to exceeding the simulation time, shorten it for the last step\n",
			
 
				-    "    if (etime + dt > sim_time) { dt = sim_time - etime; }\n",
			
 
				-    "    //Perform a single time step\n",
			
 
				-    "    perform_timestep(state,state_tmp,flux,tend,dt);\n",
			
 
				-    "    //Inform the user\n",
			
 
				-    "    if (masterproc) { printf( \"Elapsed Time: %lf / %lf\\n\", etime , sim_time ); }\n",
			
 
				-    "    //Update the elapsed time and output counter\n",
			
 
				-    "    etime = etime + dt;\n",
			
 
				-    "    output_counter = output_counter + dt;\n",
			
 
				-    "    //If it's time for output, reset the counter, and do output\n",
			
 
				-    "    if (output_counter >= output_freq) {\n",
			
 
				-    "      output_counter = output_counter - output_freq;\n",
			
 
				-    "      output(state,etime);\n",
			
 
				-    "    }\n",
			
 
				-    "  }\n",
			
 
				-    "  \n",
			
 
				-    "```\n",
			
 
				-    "\n",
			
 
				-    "At every time step the direction is reversed to get second order derivative.\n",
			
 
				-    "\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/Time_Step.jpg\" width=\"70%\" height=\"70%\">\n",
			
 
				-    "\n",
			
 
				-    "```cpp\n",
			
 
				-    "void perform_timestep( double *state , double *state_tmp , double *flux , double *tend , double dt ) {\n",
			
 
				-    "  if (direction_switch) {\n",
			
 
				-    "    //x-direction first\n",
			
 
				-    "    semi_discrete_step( state , state     , state_tmp , dt / 3 , DIR_X , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state_tmp , dt / 2 , DIR_X , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state     , dt / 1 , DIR_X , flux , tend );\n",
			
 
				-    "    //z-direction second\n",
			
 
				-    "    semi_discrete_step( state , state     , state_tmp , dt / 3 , DIR_Z , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state_tmp , dt / 2 , DIR_Z , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state     , dt / 1 , DIR_Z , flux , tend );\n",
			
 
				-    "  } else {\n",
			
 
				-    "    //z-direction second\n",
			
 
				-    "    semi_discrete_step( state , state     , state_tmp , dt / 3 , DIR_Z , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state_tmp , dt / 2 , DIR_Z , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state     , dt / 1 , DIR_Z , flux , tend );\n",
			
 
				-    "    //x-direction first\n",
			
 
				-    "    semi_discrete_step( state , state     , state_tmp , dt / 3 , DIR_X , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state_tmp , dt / 2 , DIR_X , flux , tend );\n",
			
 
				-    "    semi_discrete_step( state , state_tmp , state     , dt / 1 , DIR_X , flux , tend );\n",
			
 
				-    "  }\n",
			
 
				-    "  if (direction_switch) { direction_switch = 0; } else { direction_switch = 1; }\n",
			
 
				-    "}\n",
			
 
				-    "```\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/Semi_Discrete.jpg\" width=\"70%\" height=\"70%\">\n",
			
 
				-    "\n",
			
 
				-    "--- \n",
			
 
				-    "\n",
			
 
				-    "## Licensing \n",
			
 
				-    "\n",
			
 
				-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
			
 
				-   ]
			
 
				-  }
			
 
				- ],
			
 
				- "metadata": {
			
 
				-  "anaconda-cloud": {},
			
 
				-  "kernelspec": {
			
 
				-   "display_name": "Python 3",
			
 
				-   "language": "python",
			
 
				-   "name": "python3"
			
 
				-  },
			
 
				-  "language_info": {
			
 
				-   "codemirror_mode": {
			
 
				-    "name": "ipython",
			
 
				-    "version": 3
			
 
				-   },
			
 
				-   "file_extension": ".py",
			
 
				-   "mimetype": "text/x-python",
			
 
				-   "name": "python",
			
 
				-   "nbconvert_exporter": "python",
			
 
				-   "pygments_lexer": "ipython3",
			
 
				-   "version": "3.7.4"
			
 
				-  }
			
 
				- },
			
 
				- "nbformat": 4,
			
 
				- "nbformat_minor": 1
			
 
				-}
			
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab1-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab1-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab2-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab2-checkpoint.ipynb
@@ -1,184 +0,0 @@
 
				-{
			
 
				- "cells": [
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "In this lab, we will optimize the weather simulation application written in C++ (if you prefer to use Fortran, click [this link](../../Fortran/jupyter_notebook/profiling-fortran.ipynb)). \n",
			
 
				-    "\n",
			
 
				-    "Let's execute the cell below to display information about the GPUs running on the server by running the pgaccelinfo command, which ships with the PGI compiler that we will be using. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "!pgaccelinfo"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "## Exercise 2 \n",
			
 
				-    "\n",
			
 
				-    "### Learning objectives\n",
			
 
				-    "Learn how to identify and parallelise the computationally expensive routines in your application using OpenACC compute constructs (A compute construct is a parallel, kernels, or serial construct.). In this exercise you will:\n",
			
 
				-    "\n",
			
 
				-    "- Implement OpenACC parallelism using parallel directives to parallelise the serial application\n",
			
 
				-    "- Learn how to compile your parallel application with PGI compiler\n",
			
 
				-    "- Benchmark and compare the parallel version of the application with the serial version\n",
			
 
				-    "- Learn how to interpret PGI compiler feedback to ensure the applied optimization were successful"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "From the top menu, click on *File*, and *Open* `miniWeather_openacc.cpp` and `Makefile` from the current directory at `English/C/source_code/lab2` directory and inspect the code before running below cells. We have already added OpenACC compute directives (`#pragma acc parallel`) around the expensive routines (loops) in the code.\n",
			
 
				-    "\n",
			
 
				-    "Once done, compile the code with `make`. View the PGI compiler feedback (enabled by adding `-Minfo=accel` flag) and investigate the compiler feedback for the OpenACC code. The compiler feedback provides useful information about applied optimizations."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "!cd ../source_code/lab2 && make"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "Let's inspect part of the compiler feedback and see what it's telling us.\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/cfeedback1.png\">\n",
			
 
				-    "\n",
			
 
				-    "- Using `-ta=tesla:managed`, instruct the compiler to build for an NVIDIA Tesla GPU using \"CUDA Managed Memory\"\n",
			
 
				-    "- Using `-Minfo` command-line option, we will see all output from the compiler. In this example, we use `-Minfo=accel` to only see the output corresponding to the accelerator (in this case an NVIDIA GPU).\n",
			
 
				-    "- The first line of the output, `compute_tendencies_x`, tells us which function the following information is in reference to.\n",
			
 
				-    "- The line starting with 227, shows we created a parallel OpenACC loop. This loop is made up of gangs (a grid of blocks in CUDA language) and vector parallelism (threads in CUDA language) with the vector size being 128 per gang. `277, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */`\n",
			
 
				-    "- The rest of the information concerns data movement. Compiler detected possible need to move data and handled it for us. We will get into this later in this lab.\n",
			
 
				-    "\n",
			
 
				-    "It is very important to inspect the feedback to make sure the compiler is doing what you have asked of it.\n",
			
 
				-    "\n",
			
 
				-    "Now, **Run** the application for small values of `nx_glob`,`nz_glob`, and `sim_time`: **400, 200, 10**"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "!cd ../source_code/lab2 && nsys profile -t nvtx --stats=true --force-overwrite true -o miniWeather_3 ./miniWeather 400 200 10"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "You can see that the changes made actually slowed down the code and it runs slower compared to the non-accelerated CPU only version. Let's checkout the profiler's report. [Download the profiler output](../source_code/lab2/miniWeather_3.qdrep) and open it via the GUI. \n",
			
 
				-    "\n",
			
 
				-    "From the \"timeline view\" on the top pane, double click on the \"CUDA\" from the function table on the left and expand it. Zoom in on the timeline and you can see a pattern similar to the screenshot below. The blue boxes are the compute kernels and each of these groupings of kernels is surrounded by purple and teal boxes (annotated with red color) representing data movements. \n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/nsys_slow.png\" width=\"80%\" height=\"80%\">\n",
			
 
				-    "\n",
			
 
				-    "Let's hover your mouse over kernels (blue boxes) one by one from each row and checkout the provided information.\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/occu-1.png\" width=\"60%\" height=\"60%\">\n",
			
 
				-    "\n",
			
 
				-    "**Note**: In the next two exercises, we start optimizing the application by improving the occupancy and reducing data movements."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "## Post-Lab Summary\n",
			
 
				-    "\n",
			
 
				-    "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "%%bash\n",
			
 
				-    "cd ..\n",
			
 
				-    "rm -f openacc_profiler_files.zip\n",
			
 
				-    "zip -r openacc_profiler_files.zip *"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "**After** executing the above zip command, you should be able to download the zip file [here](../openacc_profiler_files.zip)."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "-----\n",
			
 
				-    "\n",
			
 
				-    "# <p style=\"text-align:center;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em\"> <a href=../../profiling_start.ipynb>HOME</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style=\"float:center\"> <a href=profiling-c-lab3.ipynb>NEXT</a></span> </p>\n",
			
 
				-    "\n",
			
 
				-    "-----"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "# Links and Resources\n",
			
 
				-    "\n",
			
 
				-    "[OpenACC API Guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC%20API%202.6%20Reference%20Guide.pdf)\n",
			
 
				-    "\n",
			
 
				-    "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
			
 
				-    "\n",
			
 
				-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
			
 
				-    "\n",
			
 
				-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
			
 
				-    "\n",
			
 
				-    "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
			
 
				-    "\n",
			
 
				-    "--- \n",
			
 
				-    "\n",
			
 
				-    "## Licensing \n",
			
 
				-    "\n",
			
 
				-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
			
 
				-   ]
			
 
				-  }
			
 
				- ],
			
 
				- "metadata": {
			
 
				-  "anaconda-cloud": {},
			
 
				-  "kernelspec": {
			
 
				-   "display_name": "Python 3",
			
 
				-   "language": "python",
			
 
				-   "name": "python3"
			
 
				-  },
			
 
				-  "language_info": {
			
 
				-   "codemirror_mode": {
			
 
				-    "name": "ipython",
			
 
				-    "version": 3
			
 
				-   },
			
 
				-   "file_extension": ".py",
			
 
				-   "mimetype": "text/x-python",
			
 
				-   "name": "python",
			
 
				-   "nbconvert_exporter": "python",
			
 
				-   "pygments_lexer": "ipython3",
			
 
				-   "version": "3.7.4"
			
 
				-  }
			
 
				- },
			
 
				- "nbformat": 4,
			
 
				- "nbformat_minor": 1
			
 
				-}
			
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab3-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab3-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab4-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab4-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab5-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/jupyter_notebook/.ipynb_checkpoints/profiling-c-lab5-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/C/source_code/lab1/.ipynb_checkpoints/profiling-fortran-lab1-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/C/source_code/lab1/.ipynb_checkpoints/profiling-fortran-lab1-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/appName_fortran_lab1-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/appName_fortran_lab1-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab1-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab1-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab2-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab2-checkpoint.ipynb
@@ -1,201 +0,0 @@
 
				-{
			
 
				- "cells": [
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "In this lab, we will optimize the weather simulation application written in Fortran (if you prefer to use C++, click [this link](../../C/jupyter_notebook/profiling-c.ipynb)). \n",
			
 
				-    "\n",
			
 
				-    "Let's execute the cell below to display information about the GPUs running on the server by running the pgaccelinfo command, which ships with the PGI compiler that we will be using. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "!pgaccelinfo"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "## Exercise 2 \n",
			
 
				-    "\n",
			
 
				-    "### Learning objectives\n",
			
 
				-    "Learn how to identify and parallelise the computationally expensive routines in your application using OpenACC compute constructs (A compute construct is a parallel, kernels, or serial construct.). In this exercise you will:\n",
			
 
				-    "\n",
			
 
				-    "- Implement OpenACC parallelism using parallel directives to parallelise the serial application\n",
			
 
				-    "- Learn how to compile your parallel application with PGI compiler\n",
			
 
				-    "- Benchmark and compare the parallel version of the application with the serial version\n",
			
 
				-    "- Learn how to interpret PGI compiler feedback to ensure the applied optimization were successful\n"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "From the top menu, click on *File*, and *Open* `miniWeather_openacc.f90` and `Makefile` from the current directory at `English/Fortran/source_code/lab2` directory and inspect the code before running below cells.We have already added OpenACC compute directives (`!$acc parallel loop`) around the expensive routines (loops) in the code.\n",
			
 
				-    "\n",
			
 
				-    "Once done, compile the code with `make`. View the PGI compiler feedback (enabled by adding `-Minfo=accel` flag) and investigate the compiler feedback for the OpenACC code. The compiler feedback provides useful information about applied optimizations."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "!cd ../source_code/lab2 && make"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "Let's inspect part of the compiler feedback and see what it's telling us.\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/ffeedback1-0.png\">\n",
			
 
				-    "\n",
			
 
				-    "- Using `-ta=tesla:managed`, instruct the compiler to build for an NVIDIA Tesla GPU using \"CUDA Managed Memory\"\n",
			
 
				-    "- Using `-Minfo` command-line option, we will see all output from the compiler. In this example, we use `-Minfo=accel` to only see the output corresponding to the accelerator (in this case an NVIDIA GPU).\n",
			
 
				-    "- The first line of the output, `compute_tendencies_x`, tells us which function the following information is in reference to.\n",
			
 
				-    "- The line starting with 247 and 252, shows we created a parallel OpenACC loop. This loop is made up of gangs (a grid of blocks in CUDA language) and vector parallelism (threads in CUDA language) with the vector size being 128 per gang. \n",
			
 
				-    "- The line starting with 249 and 252, `Loop is parallelizable` of the output tells us that on these lines in the source code, the compiler found loops to accelerate.\n",
			
 
				-    "- The rest of the information concerns data movement. Compiler detected possible need to move data and handled it for us. We will get into this later in this lab.\n",
			
 
				-    "\n",
			
 
				-    "It is very important to inspect the feedback to make sure the compiler is doing what you have asked of it.\n",
			
 
				-    "\n",
			
 
				-    "Now, **Run** the application for small values of `nx_glob`,`nz_glob`, and `sim_time`: **400, 200, 10**. "
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "!cd ../source_code/lab2 && ./miniWeather"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "**Profile** it with Nsight Systems command line `nsys`."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "!cd ../source_code/lab2 && nsys profile -t nvtx --stats=true --force-overwrite true -o miniWeather_3 ./miniWeather"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "You can see that the changes made actually slowed down the code and it runs slower compared to the non-accelerated CPU only version. Let's checkout the profiler's report. [Download the profiler output](../source_code/lab2/miniWeather_3.qdrep) and open it via the GUI. \n",
			
 
				-    "\n",
			
 
				-    "From the \"timeline view\" on the top pane, double click on the \"CUDA\" from the function table on the left and expand it. Zoom in on the timeline and you can see a pattern similar to the screenshot below. The blue boxes are the compute kernels and each of these groupings of kernels is surrounded by purple and teal boxes (annotated with red color) representing data movements. \n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/nsys_slow.png\" width=\"80%\" height=\"80%\">\n",
			
 
				-    "\n",
			
 
				-    "Let's hover your mouse over kernels (blue boxes) one by one from each row and checkout the provided information.\n",
			
 
				-    "\n",
			
 
				-    "<img src=\"images/occu-1.png\" width=\"60%\" height=\"60%\">\n",
			
 
				-    "\n",
			
 
				-    "**Note**: In the next two exercises, we start optimizing the application by improving the occupancy and reducing data movements."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "## Post-Lab Summary\n",
			
 
				-    "\n",
			
 
				-    "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "%%bash\n",
			
 
				-    "cd ..\n",
			
 
				-    "rm -f openacc_profiler_files.zip\n",
			
 
				-    "zip -r openacc_profiler_files.zip *"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "**After** executing the above zip command, you should be able to download the zip file [here](../openacc_profiler_files.zip)."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "-----\n",
			
 
				-    "\n",
			
 
				-    "# <p style=\"text-align:center;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em\"> <a href=../../profiling_start.ipynb>HOME</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style=\"float:center\"> <a href=profiling-fortran-lab3.ipynb>NEXT</a></span> </p>\n",
			
 
				-    "\n",
			
 
				-    "-----"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "# Links and Resources\n",
			
 
				-    "\n",
			
 
				-    "[OpenACC API Guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC%20API%202.6%20Reference%20Guide.pdf)\n",
			
 
				-    "\n",
			
 
				-    "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
			
 
				-    "\n",
			
 
				-    "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
			
 
				-    "\n",
			
 
				-    "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
			
 
				-    "\n",
			
 
				-    "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
			
 
				-    "\n",
			
 
				-    "--- \n",
			
 
				-    "\n",
			
 
				-    "## Licensing \n",
			
 
				-    "\n",
			
 
				-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
			
 
				-   ]
			
 
				-  }
			
 
				- ],
			
 
				- "metadata": {
			
 
				-  "anaconda-cloud": {},
			
 
				-  "kernelspec": {
			
 
				-   "display_name": "Python 3",
			
 
				-   "language": "python",
			
 
				-   "name": "python3"
			
 
				-  },
			
 
				-  "language_info": {
			
 
				-   "codemirror_mode": {
			
 
				-    "name": "ipython",
			
 
				-    "version": 3
			
 
				-   },
			
 
				-   "file_extension": ".py",
			
 
				-   "mimetype": "text/x-python",
			
 
				-   "name": "python",
			
 
				-   "nbconvert_exporter": "python",
			
 
				-   "pygments_lexer": "ipython3",
			
 
				-   "version": "3.7.4"
			
 
				-  }
			
 
				- },
			
 
				- "nbformat": 4,
			
 
				- "nbformat_minor": 1
			
 
				-}
			
--- a/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab3-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab3-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab4-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab4-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab5-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/jupyter_notebook/.ipynb_checkpoints/profiling-fortran-lab5-checkpoint.ipynb
--- a/misc/jupyter_lab_template/appName/English/Fortran/source_code/lab1/.ipynb_checkpoints/profiling-fortran-lab1-checkpoint.ipynb
+++ b/misc/jupyter_lab_template/appName/English/Fortran/source_code/lab1/.ipynb_checkpoints/profiling-fortran-lab1-checkpoint.ipynb