{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this lab, we will optimize the weather simulation application written in Fortran (if you prefer to use C++, click [this link](../../C/jupyter_notebook/profiling-c.ipynb)). \n", "\n", "Let's execute the cell below to display information about the GPUs running on the server by running the pgaccelinfo command, which ships with the PGI compiler that we will be using. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pgaccelinfo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2 \n", "\n", "### Learning objectives\n", "Learn how to identify and parallelise the computationally expensive routines in your application using OpenACC compute constructs (A compute construct is a parallel, kernels, or serial construct.). In this exercise you will:\n", "\n", "- Implement OpenACC parallelism using parallel directives to parallelise the serial application\n", "- Learn how to compile your parallel application with PGI compiler\n", "- Benchmark and compare the parallel version of the application with the serial version\n", "- Learn how to interpret PGI compiler feedback to ensure the applied optimization were successful\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the top menu, click on *File*, and *Open* `miniWeather_openacc.f90` and `Makefile` from the current directory at `Fortran/source_code/lab2` directory and inspect the code before running below cells.We have already added OpenACC compute directives (`!$acc parallel loop`) around the expensive routines (loops) in the code.\n", "\n", "Once done, compile the code with `make`. View the PGI compiler feedback (enabled by adding `-Minfo=accel` flag) and investigate the compiler feedback for the OpenACC code. The compiler feedback provides useful information about applied optimizations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cd ../source_code/lab2 && make clean && make" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's inspect part of the compiler feedback and see what it's telling us.\n", "\n", "\n", "\n", "- Using `-ta=tesla:managed`, instruct the compiler to build for an NVIDIA Tesla GPU using \"CUDA Managed Memory\"\n", "- Using `-Minfo` command-line option, we will see all output from the compiler. In this example, we use `-Minfo=accel` to only see the output corresponding to the accelerator (in this case an NVIDIA GPU).\n", "- The first line of the output, `compute_tendencies_x`, tells us which function the following information is in reference to.\n", "- The line starting with 247 and 252, shows we created a parallel OpenACC loop. This loop is made up of gangs (a grid of blocks in CUDA language) and vector parallelism (threads in CUDA language) with the vector size being 128 per gang. \n", "- The line starting with 249 and 252, `Loop is parallelizable` of the output tells us that on these lines in the source code, the compiler found loops to accelerate.\n", "- The rest of the information concerns data movement. Compiler detected possible need to move data and handled it for us. We will get into this later in this lab.\n", "\n", "It is very important to inspect the feedback to make sure the compiler is doing what you have asked of it.\n", "\n", "Now, **Run** the application for small values of `nx_glob`,`nz_glob`, and `sim_time`: **40, 20, 1000**. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cd ../source_code/lab2 && ./miniWeather" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Profile** it with Nsight Systems command line `nsys`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cd ../source_code/lab2 && nsys profile -t nvtx,openacc --stats=true --force-overwrite true -o miniWeather_3 ./miniWeather" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that the changes made actually slowed down the code and it runs slower compared to the non-accelerated CPU only version. Let's checkout the profiler's report. [Download the profiler output](../source_code/lab2/miniWeather_3.qdrep) and open it via the GUI. \n", "\n", "From the \"timeline view\" on the top pane, double click on the \"CUDA\" from the function table on the left and expand it. Zoom in on the timeline and you can see a pattern similar to the screenshot below. The blue boxes are the compute kernels and each of these groupings of kernels is surrounded by purple and teal boxes (annotated with red color) representing data movements. **Screenshots represents profiler report for the values of 400,200,1500.**\n", "\n", "\n", "\n", "Let's hover your mouse over kernels (blue boxes) one by one from each row and checkout the provided information.\n", "\n", "\n", "\n", "**Note**: In the next two exercises, we start optimizing the application by improving the occupancy and reducing data movements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Post-Lab Summary\n", "\n", "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page. This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd ..\n", "rm -f openacc_profiler_files.zip\n", "zip -r openacc_profiler_files.zip *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**After** executing the above zip command, you should be able to download the zip file [here](../openacc_profiler_files.zip)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----\n", "\n", "#
\n", "\n", "-----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Links and Resources\n", "\n", "[OpenACC API Guide](https://www.openacc.org/sites/default/files/inline-files/OpenACC%20API%202.6%20Reference%20Guide.pdf)\n", "\n", "[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n", "\n", "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n", "\n", "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n", "\n", "Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n", "\n", "--- \n", "\n", "## Licensing \n", "\n", "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0). " ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }