|
@@ -232,7 +232,7 @@
|
|
"outputs": [],
|
|
"outputs": [],
|
|
"source": [
|
|
"source": [
|
|
"#compile for Tesla GPU\n",
|
|
"#compile for Tesla GPU\n",
|
|
- "!cd ../../source_code/stdpar && nvc++ -std=c++17 -stdpar=gpu -o rdf rdf.cpp "
|
|
|
|
|
|
+ "!cd ../../source_code/stdpar && nvc++ -std=c++17 -DUSE_COUNTING_ITERATIOR -stdpar=gpu -o rdf rdf.cpp "
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -284,7 +284,7 @@
|
|
"\n",
|
|
"\n",
|
|
"If you inspect the output of the profiler closer, you can see the usage of *Unified Memory* annotated with green rectangle which was explained in previous sections.\n",
|
|
"If you inspect the output of the profiler closer, you can see the usage of *Unified Memory* annotated with green rectangle which was explained in previous sections.\n",
|
|
"\n",
|
|
"\n",
|
|
- "Moreover, if you compare the NVTX marker `Pair_Calculation` (from the NVTX row) in both multicore and GPU version, you can see how much improvement you achieved. In the *example screenshot*, we were able to reduce that range from 1.52 seconds to 225.8 mseconds.\n",
|
|
|
|
|
|
+ "Moreover, if you compare the NVTX marker `Pair_Calculation` (from the NVTX row) in both multicore and GPU version, you can see how much improvement you achieved. In the *example screenshot*, we were able to reduce that range from 1.52 seconds to 188.4 mseconds.\n",
|
|
"\n",
|
|
"\n",
|
|
"Feel free to checkout the [solution](../../source_code/stdpar/SOLUTION/rdf.cpp) to help you understand better or compare your implementation with the sample solution."
|
|
"Feel free to checkout the [solution](../../source_code/stdpar/SOLUTION/rdf.cpp) to help you understand better or compare your implementation with the sample solution."
|
|
]
|
|
]
|