## Profiling Tutorial

### Learning objectives
Learn how to profile your application with NVIDIA Nsight Systems and NVTX API calls to find performance limiters and bottlenecks and apply incremental parallelization strategies using OpenACC programming model. In this lab, you will:

- Understand what a profiler is and which NVIDIA Nsight tool to choose in order to profile your application
- Profile a sequential weather modeling application (integrated with NVIDIA Tools Extension (NVTX) APIs) with NVIDIA Nsight Systems to capture and trace CPU events and time ranges
- Understand how to use NVIDIA Nsight Systems profiler’s report to detect hotspots and apply OpenACC compute constructs to the serial application to parallelise it on the GPU
- Learn how to use Nsight Systems to identify issues such as underutilized GPU device and unnecessary data movements in the application and to apply optimization strategies steps by steps to expose more parallelism and utilize computer’s CPU and GPU
- Learn how to use occupancy to address performance limitations
- Learn to follow cyclical process (analyze, parallelize, optimize) to help you identify the portions of the code that would benefit from GPU acceleration and apply parallelisation strategies and optimization techniques to see additional speedups and improve performance

In this lab, we will be optimizing the serial Weather Simulation application written in both C and Fortran programming language. You are welcome to have a look at the mini weather lab and follow the steps to familiarize yourself with the application. 

An optional exercise on how to use Nsight Compute profiler is available for advanced users. This exercise covers basics on how and when to use the Nsight Compute profiler to get you started. Steps to unravel performance limiters will be presented through a simple exercise.


### Tutorial Outline
- Introduction ([C](C/jupyter_notebook/profiling-c.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran.ipynb))
 - Overview of Nsight profiler tools
 - How to use NVTX APIs
 - Overview of [Mini Weather application](C/jupyter_notebook/miniweather.ipynb)
 - Optimization Steps to parallel programming with OpneACC
- Lab 1 ([C](C/jupyter_notebook/profiling-c-lab1.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab1.ipynb))
 - How to compile a serial application with NVIDIA HPC compiler
 - How to profile a serial application with Nsight Systems and NVTX APIs
 - How to use profiler's report to find hotspots
 - Scaling and Amdahl's law and why it matters
- Lab 2 ([C](C/jupyter_notebook/profiling-c-lab2.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab2.ipynb))
 - Parallelise the serial application using OpenACC compute directives
 - How to compile a parallel application with NVIDIA HPC compiler
 - What does the compiler feedback tell us
 - Profile with Nsight Systems
 - Finding bottlenecks from Nsight Systems report
- Lab 3 ([C](C/jupyter_notebook/profiling-c-lab3.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab3.ipynb))
 - How to combine the knowledge from compiler feedback and profiler to optimize the application
 - What is occupancy
 - Demystifying Gangs, Workers, and Vectors
 - Apply collapse clause to optimize the application further
- Lab 4 ([C](C/jupyter_notebook/profiling-c-lab4.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab4.ipynb))
 - Inspect data movement from the profiler's report
 - Data management with OpenACC
 - Apply incremental parallelization strategies and use profiler's report for the next step
- Lab 5 ([C](C/jupyter_notebook/profiling-c-lab5.ipynb) , [Fortran](Fortran/jupyter_notebook/profiling-fortran-lab5.ipynb))
 - Overview of Nsight Compute
 - When and How to use Nsight Compute
 - What does the profiler tell us, where is the bottleneck
 - How to use baselines with Nsight Compute
 

### Tutorial Duration
The lab material will be presented in a 2hr session. Link to material is available for download at the end of the lab.

### Content Level
Beginner, Intermediate

### Target Audience and Prerequisites
The target audience for this lab is researchers/graduate students and developers who are interested in getting hands on experience with the NVIDIA Nsight System through profiling a real life parallel application using OpenACC programming model and NVTX.

While this tutorial does not assume any expertise in CUDA experience, basic knowledge of OpenACC programming (e.g: compute constructs), GPU architecture, and programming experience with C/C++ or Fortran is desirable.

### Start Here
You can choose between a [C-based code](C/jupyter_notebook/profiling-c.ipynb) and a [Fortran-based code](Fortran/jupyter_notebook/profiling-fortran.ipynb).

--- 

## Licensing 

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).
