|
@@ -11,7 +11,14 @@
|
|
"# Numba Lab1: Numba For CUDA GPU\n",
|
|
"# Numba Lab1: Numba For CUDA GPU\n",
|
|
"---\n",
|
|
"---\n",
|
|
"\n",
|
|
"\n",
|
|
- "Before we begin, let's execute the cell below to display information about the CUDA driver and GPUs running on the server by running the `nvidia-smi` command. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell."
|
|
|
|
|
|
+ "## Learning Objectives\n",
|
|
|
|
+ "- **The goal of this lab is to:**\n",
|
|
|
|
+ " - enable you to quickly start using Numba (beginner to advanced level)\n",
|
|
|
|
+ " - teach you to apply the concepts of CUDA GPU programming to HPC field(s); and\n",
|
|
|
|
+ " - show you how to achieve computational speedup on GPUs to maximize the throughput of your HPC implementation.\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "Before we begin, let's execute the cell below to display information about the CUDA driver and GPUs running on the server by running the `nvidia-smi` command. To do this, execute the cell block below by clicking on it with your mouse, and pressing Ctrl-Enter, or pressing the play button in the toolbar above. You should see some output returned below the grey cell."
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -27,50 +34,41 @@
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
- "\n",
|
|
|
|
- "\n",
|
|
|
|
- "### Learning Objectives\n",
|
|
|
|
- "- **The goal of this lab is to:**\n",
|
|
|
|
- " - quickly get you started with Numba from beginner to advanced level\n",
|
|
|
|
- " - teach you application of CUDA GPU programming concept in HPC field(s)\n",
|
|
|
|
- " - show you how to maximize the throughput of your HPC implementation through computational speedup on the GPU. \n",
|
|
|
|
" \n",
|
|
" \n",
|
|
- "\n",
|
|
|
|
- "\n",
|
|
|
|
"## Introduction\n",
|
|
"## Introduction\n",
|
|
- "- Numba is a just-in-time (jit) compiler for Python that works best on code that uses NumPy arrays, functions, and loops. Numba has set of decorators that can be specified before user-defined functions to determine how they are compiled. \n",
|
|
|
|
- "- A decorated function written in python is compiled into CUDA kernel to speed up execution rate, thus, Numba supports CUDA GPU programming model. \n",
|
|
|
|
- "- A kernel is written in Numba automatically have direct access to NumPy arrays. This implies a great support for data visiblilty between the host (CPU) and the device (GPU). \n",
|
|
|
|
|
|
+ "- Numba is a just-in-time (jit) compiler for Python that works best on code that uses NumPy arrays, functions, and loops. Numba has sets of decorators that can be specified at the top of user-defined functions to determine how they are compiled. \n",
|
|
|
|
+ "- Numba supports CUDA GPU programming model. Decorated function written in python is compiled into a CUDA kernel to speed up the execution rate. \n",
|
|
|
|
+ "- A kernel written in Numba automatically has direct access to NumPy arrays. This shows great support for data visibility between the host (CPU) and the device (GPU). \n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"### Definition of Terms\n",
|
|
"### Definition of Terms\n",
|
|
"- The CPU is called a **Host**. \n",
|
|
"- The CPU is called a **Host**. \n",
|
|
"- The GPU is called a **Device**.\n",
|
|
"- The GPU is called a **Device**.\n",
|
|
- "- A GPU function launched by the host and executed on the device is called a **Kernels**.\n",
|
|
|
|
- "- A GPU function executed on the device which can only be called from the device is called a **Device function**.\n",
|
|
|
|
|
|
+ "- A GPU function launched by the host and executed on the device is called a **Kernel**.\n",
|
|
|
|
+ "- A GPU function executed on the device and can only be called from the device is called a **Device function**.\n",
|
|
"\n",
|
|
"\n",
|
|
"### Note\n",
|
|
"### Note\n",
|
|
- "- It is recommended to visit the NVIDIA official documentary web page and read through [CUDA C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide), because most CUDA programming features exposed by Numba map directly to the CUDA C language offered by NVidia. \n",
|
|
|
|
- "- Numba does not implement of these CUDA features of CUDA:\n",
|
|
|
|
|
|
+ "- It is recommended to visit the NVIDIA official documentation web page and read through [CUDA C programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide), because most CUDA programming features exposed by Numba map directly to the CUDA C language offered by NVIDIA. \n",
|
|
|
|
+ "- Numba does not implement these CUDA features:\n",
|
|
" - dynamic parallelism\n",
|
|
" - dynamic parallelism\n",
|
|
" - texture memory\n",
|
|
" - texture memory\n",
|
|
"\n",
|
|
"\n",
|
|
"## CUDA Kernel\n",
|
|
"## CUDA Kernel\n",
|
|
"- In CUDA, written code can be executed by hundreds or thousands of threads at a single run, hence, a solution is modeled after the following thread hierarchy: \n",
|
|
"- In CUDA, written code can be executed by hundreds or thousands of threads at a single run, hence, a solution is modeled after the following thread hierarchy: \n",
|
|
- " - **Grid**: A kernel executed as a collection of blocks \n",
|
|
|
|
- " - **Thread Block**: Collection of threads that can communicate via a shared memory. Each thread is executed by a core.\n",
|
|
|
|
|
|
+ " - **Grid**: A kernel executed as a collection of blocks. \n",
|
|
|
|
+ " - **Thread Block**: Collection of threads that can communicate via shared memory. Each thread is executed by a core.\n",
|
|
" - **Thread**: Single execution units that run kernels on GPU.\n",
|
|
" - **Thread**: Single execution units that run kernels on GPU.\n",
|
|
"- Numba exposes three kinds of GPU memory: \n",
|
|
"- Numba exposes three kinds of GPU memory: \n",
|
|
" - global device memory \n",
|
|
" - global device memory \n",
|
|
" - shared memory \n",
|
|
" - shared memory \n",
|
|
- " - local memory. \n",
|
|
|
|
|
|
+ " - local memory \n",
|
|
"- Memory access should be carefully considered in order to keep bandwidth contention at minimal.\n",
|
|
"- Memory access should be carefully considered in order to keep bandwidth contention at minimal.\n",
|
|
"\n",
|
|
"\n",
|
|
" <img src=\"../images/thread_blocks.JPG\"/> <img src=\"../images/memory_architecture.png\"/> \n",
|
|
" <img src=\"../images/thread_blocks.JPG\"/> <img src=\"../images/memory_architecture.png\"/> \n",
|
|
"\n",
|
|
"\n",
|
|
"### Kernel Declaration\n",
|
|
"### Kernel Declaration\n",
|
|
- "- A kernel function is a GPU function that is called from a CPU code by specifying the number of block threads and threads per block, and can not explicitly return a value except through a passed array. \n",
|
|
|
|
- "- A kernel can be called multiple times with varying number of blocks per grid and threads per block after its has been compiled once.\n",
|
|
|
|
|
|
+ "- A kernel function is a GPU function that is called from a CPU code. It requires specifying the number of blocks and threads per block and cannot explicitly return a value except through a passed array. \n",
|
|
|
|
+ "- A kernel can be called multiple times with varying number of blocks per grid and threads per block after it has been compiled once.\n",
|
|
"\n",
|
|
"\n",
|
|
"Example:\n",
|
|
"Example:\n",
|
|
"\n",
|
|
"\n",
|
|
@@ -89,12 +87,12 @@
|
|
"```\n",
|
|
"```\n",
|
|
"\n",
|
|
"\n",
|
|
"###### Choosing Block Size\n",
|
|
"###### Choosing Block Size\n",
|
|
- "- The block size determines how many threads share a given area of shared memory.\n",
|
|
|
|
|
|
+ "- The block size determines how many threads share a given area of the shared memory.\n",
|
|
"- The block size must be large enough to accommodate all computation units. See more details [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/).\n",
|
|
"- The block size must be large enough to accommodate all computation units. See more details [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/).\n",
|
|
"\n",
|
|
"\n",
|
|
"### Thread Positioning \n",
|
|
"### Thread Positioning \n",
|
|
- "- When running a kernel, the kernel function’s code is executed by every thread once. Hence is it important to uniquely identify distinct threads.\n",
|
|
|
|
- "- The default way to determine a thread position in a grid and block is to manually compute the corresponding array position:\n",
|
|
|
|
|
|
+ "- When running a kernel, the kernel function’s code is executed by every thread once. Therefore, it is important to uniquely identify distinct threads.\n",
|
|
|
|
+ "- The default way to determine a thread position in a grid and block is to manually compute the corresponding array positions:\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"<img src=\"../images/thread_position.png\"/>\n",
|
|
"<img src=\"../images/thread_position.png\"/>\n",
|
|
@@ -154,21 +152,21 @@
|
|
"> - N is the size of the array and the number of threads in a single block is 128.\n",
|
|
"> - N is the size of the array and the number of threads in a single block is 128.\n",
|
|
"> - The **cuda.jit()** decorator indicates that the function (arrayAdd) below is a device kernel and should run parallel. The **tid** is the estimate of a unique index for each thread in the device memory grid: \n",
|
|
"> - The **cuda.jit()** decorator indicates that the function (arrayAdd) below is a device kernel and should run parallel. The **tid** is the estimate of a unique index for each thread in the device memory grid: \n",
|
|
">> **tid = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x**.\n",
|
|
">> **tid = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x**.\n",
|
|
- "> - **array_A** and **array_B** are input data, while **array_out** is output array and is already preload with zeros.\n",
|
|
|
|
- "> - The statement **blockpergrid = N + (threadsperblock - 1) // threadsperblock** Computes the size of block per grid. This line of code is commonly use as the default formular to estimate number of blocks per grid in several GPU programming documentations.\n",
|
|
|
|
- "> - **arrayAdd[blockpergrid, threadsperblock](array_A, array_B, array_out)** indicate a call to a kernel function **addAdd** having the number of blocks per grid and number of threads per block in square bracket, while kernel arguments are in round brackets.\n",
|
|
|
|
|
|
+ "> - **array_A** and **array_B** are input data, while **array_out** is the output array and is already preload with zeros.\n",
|
|
|
|
+ "> - The statement **blockpergrid = N + (threadsperblock - 1) // threadsperblock** computes the size of block per grid. This line of code is commonly use as the default formular to estimate the number of blocks per grid in GPU programming documentations.\n",
|
|
|
|
+ "> - **arrayAdd[blockpergrid, threadsperblock](array_A, array_B, array_out)** indicate a call to a kernel function **arrayAdd** having the number of blocks per grid and number of threads per block in a square bracket, while kernel arguments are in a round bracket.\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
- "### Matrix multiplication on 2D Array \n",
|
|
|
|
|
|
+ "### Matrix Multiplication on 2D Array \n",
|
|
"\n",
|
|
"\n",
|
|
"<img src=\"../images/2d_array.png\"/>\n",
|
|
"<img src=\"../images/2d_array.png\"/>\n",
|
|
"\n",
|
|
"\n",
|
|
"<img src=\"../images/2d_col_mult.png\"/>\n",
|
|
"<img src=\"../images/2d_col_mult.png\"/>\n",
|
|
"\n",
|
|
"\n",
|
|
"> **Note**\n",
|
|
"> **Note**\n",
|
|
- "> - **Approach 2** would not be possible if the matrix size exceed the maximum number of threads per block on the device, while **Approach 1** would continue to execute. Most latest GPUs have maximum of 1024 threads per thread block. \n",
|
|
|
|
|
|
+ "> - **Approach 2** would not be possible if the matrix size exceeds the maximum number of threads per block on the device, while **Approach 1** would continue to execute. The latest GPUs have maximum of 1024 threads per thread block. \n",
|
|
"\n",
|
|
"\n",
|
|
"### Example 2: Matrix multiplication "
|
|
"### Example 2: Matrix multiplication "
|
|
]
|
|
]
|
|
@@ -215,7 +213,7 @@
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
- "### Exaample 3: A 225 × 225 Matrix Multiplication"
|
|
|
|
|
|
+ "### Example 3: A 225 × 225 Matrix Multiplication"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -251,11 +249,11 @@
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
- "### Thread reuse \n",
|
|
|
|
|
|
+ "### Thread Reuse \n",
|
|
"\n",
|
|
"\n",
|
|
- "- It is possible to specify a few number of threads for a data size such that threads are reused to complete the computation of the entire data. This is one of the approach used when a data to be computed is larger than the maximum number of threads available in a device memory. \n",
|
|
|
|
|
|
+ "- It is possible to specify a few numbers of threads for a data size such that threads are reused to complete the computation of the entire data. This is one of the approaches used when a data to be computed is larger than the maximum number of threads available in a device memory. \n",
|
|
"- This statement is used in a while loop: ***tid += cuda.blockDim.x * cuda.gridDim.x***\n",
|
|
"- This statement is used in a while loop: ***tid += cuda.blockDim.x * cuda.gridDim.x***\n",
|
|
- "- An example is given below to illustrates thread reuse. In the example, small number of thread is specified on purpose in order to show the possibility of this approach. \n",
|
|
|
|
|
|
+ "- An example is given below to illustrate thread reuse. In the example, a small number of threads is specified on purpose in order to show the possibility of this approach. \n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"#### Example 4: "
|
|
"#### Example 4: "
|
|
@@ -295,7 +293,7 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
"> **Note**\n",
|
|
"> **Note**\n",
|
|
- "> - The task in **example 4** is the same as in **example 1** but with limited number of threads specified, howbeit, the same result was achieved. \n",
|
|
|
|
|
|
+ "> - The task in **example 4** is the same as in **example 1** but with limited number of threads specified, however, the same result was achieved. \n",
|
|
"> - Note that this approach may delegate more threads than required. In the code above, an excess of 1 block of threads may be delegated.\n"
|
|
"> - Note that this approach may delegate more threads than required. In the code above, an excess of 1 block of threads may be delegated.\n"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
@@ -306,7 +304,7 @@
|
|
"## Memory Management\n",
|
|
"## Memory Management\n",
|
|
"\n",
|
|
"\n",
|
|
"### Data Transfer \n",
|
|
"### Data Transfer \n",
|
|
- "- When a kernel is excuted, Numba automatically transfer NumPy arrays to the device and vice versa.\n",
|
|
|
|
|
|
+ "- When a kernel is executed, Numba automatically transfers NumPy arrays to the device and vice versa.\n",
|
|
"- In order to avoid the unnecessary transfer for read-only arrays, the following APIs can be used to manually control the transfer.\n",
|
|
"- In order to avoid the unnecessary transfer for read-only arrays, the following APIs can be used to manually control the transfer.\n",
|
|
"\n",
|
|
"\n",
|
|
"##### 1. Copy host to device\n",
|
|
"##### 1. Copy host to device\n",
|
|
@@ -334,7 +332,7 @@
|
|
"h_C = d_C.copy_to_host()\n",
|
|
"h_C = d_C.copy_to_host()\n",
|
|
"h_C = d_C.copy_to_host(stream=stream)\n",
|
|
"h_C = d_C.copy_to_host(stream=stream)\n",
|
|
"```\n",
|
|
"```\n",
|
|
- "### Example 5: data movement "
|
|
|
|
|
|
+ "### Example 5: Data Movement "
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -375,8 +373,8 @@
|
|
"source": [
|
|
"source": [
|
|
"## Atomic Operation\n",
|
|
"## Atomic Operation\n",
|
|
"\n",
|
|
"\n",
|
|
- "- Atomic operation is required in a situation where multiple threads attempt to modify a common portion of the memory. \n",
|
|
|
|
- "- Typical example includes: simultaneous withdrawal from a bank account through ATM machine or large number of threads modfying a particular index of an array based on certain condition(s)\n",
|
|
|
|
|
|
+ "- Atomic operation is required when multiple threads attempt to modify a common portion of the memory. \n",
|
|
|
|
+ "- A typical example includes simultaneous withdrawal from a bank account through ATM machine or a large number of threads modifying a particular index of an array based on certain condition(s).\n",
|
|
"- List of presently implemented atomic operations supported by Numba are:\n",
|
|
"- List of presently implemented atomic operations supported by Numba are:\n",
|
|
"> **import numba.cuda as cuda**\n",
|
|
"> **import numba.cuda as cuda**\n",
|
|
"> - cuda.atomic.add(array, index, value)\n",
|
|
"> - cuda.atomic.add(array, index, value)\n",
|
|
@@ -436,7 +434,7 @@
|
|
"### 7. CUDA Ufuncs\n",
|
|
"### 7. CUDA Ufuncs\n",
|
|
"\n",
|
|
"\n",
|
|
"- The CUDA ufunc supports passing intra-device arrays to reduce traffic over the PCI-express bus. \n",
|
|
"- The CUDA ufunc supports passing intra-device arrays to reduce traffic over the PCI-express bus. \n",
|
|
- "- It also support asynchronous mode by using stream keyword.\n",
|
|
|
|
|
|
+ "- It also supports asynchronous mode by using stream keyword.\n",
|
|
"\n",
|
|
"\n",
|
|
"<img src=\"../images/ufunc.png\"/>"
|
|
"<img src=\"../images/ufunc.png\"/>"
|
|
]
|
|
]
|
|
@@ -447,6 +445,9 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"outputs": [],
|
|
"source": [
|
|
"source": [
|
|
|
|
+ "# example: c = (a - b) * (a + b)\n",
|
|
|
|
+ "# size of each array(A, B, C) is N = 10000\n",
|
|
|
|
+ "\n",
|
|
"from numba import vectorize\n",
|
|
"from numba import vectorize\n",
|
|
"import numba.cuda as cuda\n",
|
|
"import numba.cuda as cuda\n",
|
|
"import numpy as np\n",
|
|
"import numpy as np\n",
|
|
@@ -467,9 +468,10 @@
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
- "#### Device function\n",
|
|
|
|
|
|
+ "### Device Function\n",
|
|
"\n",
|
|
"\n",
|
|
- "- The CUDA device functions can only be invoked from within the device and can return a value like normal functions. The device function is usually placed before the CUDA ufunc kernel otherwise a call to the device function may not be visible inside the ufunc kernel."
|
|
|
|
|
|
+ "- The CUDA device functions can only be invoked from within the device and can return a value like normal functions. The device function is usually placed before the CUDA ufunc kernel otherwise a call to the device function may not be visible inside the ufunc kernel.\n",
|
|
|
|
+ "- The attributes <i>device=True</i> and <i>inline=true</i> indicate that <i>\"device_ufunc\"</i> is a device function."
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -478,6 +480,8 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"outputs": [],
|
|
"source": [
|
|
"source": [
|
|
|
|
+ "#example: c = sqrt((a - b) * (a + b))\n",
|
|
|
|
+ "\n",
|
|
"from numba import vectorize\n",
|
|
"from numba import vectorize\n",
|
|
"import numba.cuda as cuda\n",
|
|
"import numba.cuda as cuda\n",
|
|
"import numpy as np\n",
|
|
"import numpy as np\n",
|
|
@@ -506,7 +510,7 @@
|
|
"\n",
|
|
"\n",
|
|
"## Lab Task\n",
|
|
"## Lab Task\n",
|
|
"\n",
|
|
"\n",
|
|
- "In this section, you are expected to click on the **Serial code Lab Assignment** link and proceed to Lab 2. In this lab you will find three python serial code functions. You are required to revise the **pair_gpu** function and make it run on the GPU, and likewise do a few modifications on the **main** function.\n",
|
|
|
|
|
|
+ "In this section, you are expected to click on the **Serial Code Lab Assignment** link and proceed to Lab 2. In this lab you will find three python serial code functions. You are required to revise the **pair_gpu** function to run on the GPU, and likewise do a few modifications within the **main** function.\n",
|
|
"\n",
|
|
"\n",
|
|
"## <div style=\"text-align:center; color:#FF0000; border:3px solid red;height:80px;\"> <b><br/> [Serial Code Lab Assignment](serial_RDF.ipynb) </b> </div>\n",
|
|
"## <div style=\"text-align:center; color:#FF0000; border:3px solid red;height:80px;\"> <b><br/> [Serial Code Lab Assignment](serial_RDF.ipynb) </b> </div>\n",
|
|
"\n",
|
|
"\n",
|
|
@@ -514,7 +518,7 @@
|
|
"\n",
|
|
"\n",
|
|
"## Post-Lab Summary\n",
|
|
"## Post-Lab Summary\n",
|
|
"\n",
|
|
"\n",
|
|
- "If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page. This will ensure the images are copied down as well. You can also execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below.\n",
|
|
|
|
|
|
+ "If you would like to download this lab for later viewing, we recommend you go to your browser's File menu (not the Jupyter notebook file menu) and save the complete web page. This will ensure the images are copied as well. You can also execute the following cell block to create a zip-file of the files you've been working on and download it with the link below.\n",
|
|
"\n"
|
|
"\n"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
@@ -550,9 +554,9 @@
|
|
"\n",
|
|
"\n",
|
|
"[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
|
|
"[NVIDIA Nsight System](https://docs.nvidia.com/nsight-systems/)\n",
|
|
"\n",
|
|
"\n",
|
|
- "[CUDA Toolkit Download](https://developer.nvidia.com/cuda-downloads)\n",
|
|
|
|
|
|
+ "[NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)\n",
|
|
"\n",
|
|
"\n",
|
|
- "**NOTE**: To be able to see the Nsight System profiler output, please download Nsight System latest version from [here](https://developer.nvidia.com/nsight-systems).\n",
|
|
|
|
|
|
+ "**NOTE**: To be able to see the Nsight System profiler output, please download the latest version Nsight System from [here](https://developer.nvidia.com/nsight-systems).\n",
|
|
"\n",
|
|
"\n",
|
|
"Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
|
|
"Don't forget to check out additional [OpenACC Resources](https://www.openacc.org/resources) and join our [OpenACC Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n",
|
|
"\n",
|
|
"\n",
|
|
@@ -570,7 +574,7 @@
|
|
"\n",
|
|
"\n",
|
|
"## Licensing \n",
|
|
"## Licensing \n",
|
|
"\n",
|
|
"\n",
|
|
- "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
|
|
|
|
|
|
+ "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
|
|
]
|
|
]
|
|
}
|
|
}
|
|
],
|
|
],
|