{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, we will optimize the weather simulation application written in C++ (if you prefer to use Fortran, click [this link](../../Fortran/jupyter_notebook/profiling-fortran.ipynb)). \n",
    "\n",
    "Let's execute the cell below to display information about the GPUs running on the server by running the nvaccelinfo command, which ships with the NVIDIA HPC compiler that we will be using. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!nvaccelinfo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 4\n",
    "\n",
    "### Learning objectives\n",
    "Learn how to improve the performance of the application by managing data movement and reducing the unnecessary data transfers. In this exercise you will:\n",
    "\n",
    "- Learn about unified memory and how to automatically migrate data between CPU and GPU\n",
    "- Learn how to use it via NVIDIA HPC compiler managed option, and profiling managed memory\n",
    "- Learn how to identify redundant memory copies via Nsight Systems\n",
    "- Learn how to improve efficiency by reducing extra data copies via OpenACC data directive\n",
    "- Learn how to use NVIDIA HPC compiler feedback as a guidance on where to insert the OpenACC data directives\n",
    "- Apply data directives to the parallel application, benchmark and profile it\n",
    "\n",
    "Let's inspect the profiler report from previous exercise. From the \"timeline view\" on the top pane, double click on the \"CUDA\" from the function table on the left and expand it. Zoom in on the timeline and you can see a pattern similar to the screenshot below. The blue boxes are the compute kernels and each of these groupings of kernels is surrounded by purple and teal boxes (annotated with red color) representing data movements.\n",
    "\n",
    "What this graph is showing us is that we're doing a lot of data movement between GPU and CPU.\n",
    "    \n",
    " \n",
    "\n",
    "The compiler feedback we collected earlier tells us quite a bit about data movement too. If we look again at the compiler feedback from above, we see the following.\n",
    "\n",
    "
\n",
    "\n",
    "The compiler feedback we collected earlier tells us quite a bit about data movement too. If we look again at the compiler feedback from above, we see the following.\n",
    "\n",
    " \n",
    "\n",
    "The compiler feedback is telling us that the compiler has inserted data movement around our parallel region at line 277 which copies the `hy_dens_cell`, `hy_dens_theta_cell`, and `state` arrays in and out of GPU memory and also copies `flux` array out. \n",
    "\n",
    "The compiler can only work with the information we provide. It knows we need the `hy_dens_cell`, `hy_dens_theta_cell`, `state`, and `flux` arrays on the GPU for the accelerated section within the  `compute_tendencies_x` function, but we didn't tell the compiler anything about what happens to the data outside of those sections. Without this knowledge, the compiler has to copy the full arrays to the GPU and back to the CPU for each accelerated section. This is a lot of unnecessary data transfers. \n",
    "\n",
    "Ideally, we would want to move the data (example: `hy_dens_cell`, `hy_dens_theta_cell`, `state` arrays) to the GPU at the beginning, and only transfer back to the CPU at the end (if needed). And as for the `flux` array in this example, we do not need to copy any data back and forth. So we only need to create space on the device (GPU) for this array. \n",
    "\n",
    "We need to give the compiler information about how to reduce the extra and unnecessary data movement. By adding OpenACC `data` directive to a structured code block, the compiler will know how to manage data according to the clauses. For information on the data directive clauses, please visit [OpenACC 3.0 Specification](https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC.3.0.pdf).\n",
    "\n",
    "Now, add `data` directives to the code, save the file, re-compile via `make`, and profile it again.\n",
    "\n",
    "Click on the [miniWeather_openacc.cpp](../source_code/lab4/miniWeather_openacc.cpp) and [Makefile](../source_code/lab4/Makefile) links and modify `miniWeather_openacc.cpp` and `Makefile`. Remember to **SAVE** your code after changes, before running below cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../source_code/lab4 && make clean && make"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us start inspecting the compiler feedback and see if it applied the optimizations. Here is the screenshot of expected compiler feedback after adding the `data` directives. You can see that on line 281, compiler is generating default present for `hy_dens_cell`, `hy_dens_theta_cell`, `state`, and `flux` arrays. In other words, it is assuming that data is present on the GPU and it only copies data to the GPU only if the data do not exist.\n",
    "\n",
    "
\n",
    "\n",
    "The compiler feedback is telling us that the compiler has inserted data movement around our parallel region at line 277 which copies the `hy_dens_cell`, `hy_dens_theta_cell`, and `state` arrays in and out of GPU memory and also copies `flux` array out. \n",
    "\n",
    "The compiler can only work with the information we provide. It knows we need the `hy_dens_cell`, `hy_dens_theta_cell`, `state`, and `flux` arrays on the GPU for the accelerated section within the  `compute_tendencies_x` function, but we didn't tell the compiler anything about what happens to the data outside of those sections. Without this knowledge, the compiler has to copy the full arrays to the GPU and back to the CPU for each accelerated section. This is a lot of unnecessary data transfers. \n",
    "\n",
    "Ideally, we would want to move the data (example: `hy_dens_cell`, `hy_dens_theta_cell`, `state` arrays) to the GPU at the beginning, and only transfer back to the CPU at the end (if needed). And as for the `flux` array in this example, we do not need to copy any data back and forth. So we only need to create space on the device (GPU) for this array. \n",
    "\n",
    "We need to give the compiler information about how to reduce the extra and unnecessary data movement. By adding OpenACC `data` directive to a structured code block, the compiler will know how to manage data according to the clauses. For information on the data directive clauses, please visit [OpenACC 3.0 Specification](https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC.3.0.pdf).\n",
    "\n",
    "Now, add `data` directives to the code, save the file, re-compile via `make`, and profile it again.\n",
    "\n",
    "Click on the [miniWeather_openacc.cpp](../source_code/lab4/miniWeather_openacc.cpp) and [Makefile](../source_code/lab4/Makefile) links and modify `miniWeather_openacc.cpp` and `Makefile`. Remember to **SAVE** your code after changes, before running below cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../source_code/lab4 && make clean && make"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us start inspecting the compiler feedback and see if it applied the optimizations. Here is the screenshot of expected compiler feedback after adding the `data` directives. You can see that on line 281, compiler is generating default present for `hy_dens_cell`, `hy_dens_theta_cell`, `state`, and `flux` arrays. In other words, it is assuming that data is present on the GPU and it only copies data to the GPU only if the data do not exist.\n",
    "\n",
    " \n",
    "\n",
    "Now, **Profile** your code with Nsight Systems command line `nsys`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../source_code/lab4 && nsys profile -t nvtx,openacc --stats=true --force-overwrite true -o miniWeather_5 ./miniWeather"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download and save the report file by holding down Shift and Right-Clicking [Here](../source_code/lab4/miniWeather_5.qdrep) and open it via the GUI. Have a look at the example expected output below:\n",
    "\n",
    "
\n",
    "\n",
    "Now, **Profile** your code with Nsight Systems command line `nsys`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../source_code/lab4 && nsys profile -t nvtx,openacc --stats=true --force-overwrite true -o miniWeather_5 ./miniWeather"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download and save the report file by holding down Shift and Right-Clicking [Here](../source_code/lab4/miniWeather_5.qdrep) and open it via the GUI. Have a look at the example expected output below:\n",
    "\n",
    " \n",
    "\n",
    "Have a look at the data movements annotated with red color and compare it with the previous versions. We have accelerated the application and reduced the execution time by eliminating the unnecessary data transfers between CPU and GPU.\n",
    "\n",
    "**Note**: Next exercise gives an overview on introduction to Nsight Compute tool and it is optional."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Post-Lab Summary\n",
    "\n",
    "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "cd ..\n",
    "rm -f openacc_profiler_files.zip\n",
    "zip -r openacc_profiler_files.zip *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**After** executing the above zip command, you should be able to download and save the zip file by holding down Shift and Right-Clicking [Here](../openacc_profiler_files.zip)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-----\n",
    "\n",
    "#
\n",
    "\n",
    "Have a look at the data movements annotated with red color and compare it with the previous versions. We have accelerated the application and reduced the execution time by eliminating the unnecessary data transfers between CPU and GPU.\n",
    "\n",
    "**Note**: Next exercise gives an overview on introduction to Nsight Compute tool and it is optional."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Post-Lab Summary\n",
    "\n",
    "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "cd ..\n",
    "rm -f openacc_profiler_files.zip\n",
    "zip -r openacc_profiler_files.zip *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**After** executing the above zip command, you should be able to download and save the zip file by holding down Shift and Right-Clicking [Here](../openacc_profiler_files.zip)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-----\n",
    "\n",
    "#