|
@@ -29,7 +29,7 @@
|
|
|
"There is a very thin line between these categories and within that limited scope and view we could categorize different approaches as follows:\n",
|
|
|
"\n",
|
|
|
" \n",
|
|
|
- "| | OpenACC | OpenMP | stdpar | Kokkos | CUDA Laguages |\n",
|
|
|
+ "| | OpenACC | OpenMP | DO-CONCURRENT | Kokkos | CUDA Laguages |\n",
|
|
|
"| --- | --- | --- | --- | --- | --- |\n",
|
|
|
"| Ease | High | High | High | Intermediate | Low |\n",
|
|
|
"| Performance | Depends | Depends | Depends | High | Best |\n",
|
|
@@ -40,29 +40,26 @@
|
|
|
"\n",
|
|
|
"## Ease of Programming\n",
|
|
|
"- The directive‐based OpenMP and OpenACC programming models are generally least intrusive when applied to the loops. \n",
|
|
|
- "- Kokkos required restructuring of the existing code for the parallel dispatch via functors or lambda functions\n",
|
|
|
"- CUDA required a comparable amount of rewriting effort, in particular, to map the loops onto a CUDA grid of threads and thread blocks\n",
|
|
|
- "- stdpar also required us to change the constructs to make use of C++17 templates and may be preferred for new developments having C++ template style coding. \n",
|
|
|
- "- The overhead for OpenMP and OpenACC in terms of lines of code is the smallest, followed by stdpar and Kokkos\n",
|
|
|
+ "- DO-CONCURRENT also required us to do minimal change by replacing the *do* loop to *do concurrent* . \n",
|
|
|
+ "- The overhead for OpenMP, OpenACC and DO-CONCURRENT in terms of lines of code is the smallest\n",
|
|
|
"\n",
|
|
|
"## Performance\n",
|
|
|
"While we have not gone into the details of optimization for any of these programming model the analysis provided here is based on the general design of the programming model itself.\n",
|
|
|
"\n",
|
|
|
- "- Kokkos when compiled enables the use of correct compiler optimization flags for the respective platform, while for the other frameworks, the user has to set these flags manually. This gives kokkos an upper hand over OpenACC and OpenMP. \n",
|
|
|
"- OpenACC and OpenMP abstract model defines a least common denominator for accelerator devices, but cannot represent architectural specifics of these devices without making the language less portable.\n",
|
|
|
- "- stdpar on the other hand is more abstract and gives less control to developers to optimize the code\n",
|
|
|
+ "- DO-CONCURRENT on the other hand is more abstract and gives less control to developers to optimize the code\n",
|
|
|
"\n",
|
|
|
"## Portability\n",
|
|
|
- "We observed the same code being run on moth multicore and GPU using OpenMP, OpenACC, Kokkos and stdpar. The point we highlight here is how a programming model supports the divergent cases where developers may choose to use different directive variant to get more performance. In a real application the tolerance for this portability/performance trade-off will vary according to the needs of the programmer and application \n",
|
|
|
+ "We observed the same code being run on moth multicore and GPU using OpenMP, OpenACC and DO-CONCURRENT. The point we highlight here is how a programming model supports the divergent cases where developers may choose to use different directive variant to get more performance. In a real application the tolerance for this portability/performance trade-off will vary according to the needs of the programmer and application \n",
|
|
|
"- OpenMP supports [Metadirective](https://www.openmp.org/spec-html/5.0/openmpsu28.html) where the developer can choose to activate different directive variant based on the condition selected.\n",
|
|
|
"- In OpenACC when using ```kernel``` construct, the compiler is responsible for mapping and partitioning the program to the underlying hardware. Since the compiler will mostly take care of the parallelization issues, the descriptive approach may generate performance code for specific architecture. The downside is the quality of the generated accelerated code depends significantly on the capability of the compiler used and hence the term \"may\".\n",
|
|
|
"\n",
|
|
|
"\n",
|
|
|
"## Support\n",
|
|
|
- "- Kokkos project is very well documented and the developers support on GitHub is excellent \n",
|
|
|
"- OpenACC implementation is present in most popular compilers like NVIDIA HPC SDK, PGI, GCC, Clang and CRAY. \n",
|
|
|
"- OpenMP GPU support is currently available on limited compilers but being the most supported programming model for multicore it is matter of time when it comes at par with other models for GPU support.\n",
|
|
|
- "- stdpar being part of the C++ standard is bound to become integral part of most compiler supporting parallelism. \n",
|
|
|
+ "- DO-CONCURRENT being part of the ISO Fortran standard is bound to become integral part of most compiler supporting parallelism. \n",
|
|
|
"\n",
|
|
|
"\n",
|
|
|
"Parallel Computing in general has been a difficult task and requires developers not just to know a programming approach but also think in parallel. While this tutorial provide you a good start, it is highly recommended to go through Profiling and Optimization bootcamps as next steps.\n",
|
|
@@ -110,7 +107,7 @@
|
|
|
"name": "python",
|
|
|
"nbconvert_exporter": "python",
|
|
|
"pygments_lexer": "ipython3",
|
|
|
- "version": "3.7.4"
|
|
|
+ "version": "3.6.2"
|
|
|
}
|
|
|
},
|
|
|
"nbformat": 4,
|