4 лет назад · 02ceb4de77
--- a/README.md
+++ b/README.md
@@ -1,11 +1,32 @@
 
				 #  GPUBootcamp Official Training Materials
			
 
				+GPU Bootcamps are designed to help build confidence in Accelerated Computing and eventually prepare developers to enroll for [Hackathons](http://gpuhackathons.org/)
			
 
				+
			
 
				 This repository consists of GPU bootcamp material for HPC, AI and convergence of both:
			
 
				 
			
 
				-- [AI](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/ai)
			
 
				+- [HPC](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc) :: 
			
 
				+The bootcamp content focuses on how to follow the Analyze, Parallelize and Optimize Cycle to write parallel codes using different parallel programming models accelerating HPC simulations.
			
 
				+
			
 
				+| Lab      | Description |
			
 
				+| ----------- | ----------- |
			
 
				+| [N-Ways](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc/nways)      | This lab will cover multiple GPU programming models and choose the one that best fits your needs. The material supports different programmin glangauges including C ( CUDA C, OpenACC C, OpenMP C, C++ stdpar ),  Fortran ( CUDA Fortran, OpenACC Fortran, OpenMP Fortran, ISO DO CONCURRENT       |
			
 
				+| [OpenACC](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc/openacc)   | The lab will cover how to write portable parallel program that can run on multicore CPUs and accelerators like GPUs and how to apply incremental parallelization strategies using OpenACC       |
			
 
				+
			
 
				+- [Convergence of HPC and AI](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc_ai) :: 
			
 
				+The bootcamp content focuses on how AI can accelerate HPC simulations by introducing concepts of Deep Neural Networks, including data pre-processing, techniques on how to build, compare and improve accuracy of deep learning models.
			
 
				 
			
 
				-- [HPC](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc)
			
 
				+| Lab      | Description |
			
 
				+| ----------- | ----------- |
			
 
				+| [Weather Pattern Recognition](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc_ai/ai_science_climate)      | This Bootcamp will introduce developers to fundamentals of AI and how data driven approach can be applied to Climate/Weather domain |
			
 
				+| [CFD Flow Prediction](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc_ai/ai_science_cfd)      | This Bootcamp will introduce developers to fundamentals of AI and how they can be applied to CFD (Computational Fluid Dynamics) |
			
 
				 
			
 
				-- [Convergence of HPC and AI](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc_ai)
			
 
				+- [AI](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/ai)::
			
 
				+The bootcamp content focuses on using popular accelerated AI frameworks and using optimization techniques to get max performance from accelerators like GPU.
			
 
				+
			
 
				+
			
 
				+| Lab      | Description |
			
 
				+| ----------- | ----------- |
			
 
				+| [Accelerated Intelligent Video Analytics](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/ai/DeepStream) | Learn how Nvidia DeepStream SDK can be used to create optimized Intelligent Video Analytics (IVA) pipeline. Participants will be exposed to the building blocks for creating IVA pipeline followed by profiling exercise to identify hotspots in the pipeline and methods to optimize and get higher throughput       |
			
 
				+| [Accelerated Data Science](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/ai/RAPIDS)   | Learn how RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. Participants will be exposed to using libraries that can be easily integrated with the daily data science pipeline and accelerate computations for faster execution       |
			
 
				 
			
 
				 # System Requirements
			
 
				 Each lab contains docker and singularity definition files. Follow the readme files inside each on how to build the container and run the labs inside it.
			
@@ -18,8 +39,5 @@ Each lab contains docker and singularity definition files. Follow the readme fil
 
				 - Bootcamp users may request for newer training material or file a bug by filing a github issues
			
 
				 - Please do go through the existing list of issues to get more details of upcoming features and bugs currently being fixed [Issues](https://github.com/gpuhackathons-org/gpubootcamp/issues)
			
 
				 
			
 
				-<!--# Slides:
			
 
				-The slides associated with these training materials can be downloaded from [Google Slides](https://drive.google.com/drive/folders/1laRYdu6mtSA29M6Xthc1jP8AEOtVnbBo?usp=sharing)-->
			
 
				-
			
 
				 ## Questions?
			
 
				 Please join [OpenACC Slack Channel](https://openacclang.slack.com/messages/openaccusergroup) for questions.
			
--- a/hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/openmp/nways_openmp.ipynb
@@ -200,7 +200,7 @@
 
				     "### ```teams``` directive\n",
			
 
				     "```teams``` directve creates a league of thread teams where the master thread of each team executes the region. Each of these master threads executes sequentially. Or in other words teams directive spawn 1 or more thread teams with the same number of threads. The execution continues on the master threads of each team (redundantly). There is no synchronization allowed between teams. \n",
			
 
				     "\n",
			
 
				-    "OpenMP calls that somewhere a gang, which might be a thread on the CPU or maying a CUDA threadblock or OpenCL workgroup. It will choose how many teams to create based on where you're running, only a few on a CPU (like 1 per CPU core) or lots on a GPU (1000's possibly). ```teams``` allow OpenMP code to scale from small CPUs to large GPUs because each one works completely independently of each other ```teams```.\n",
			
 
				+    "OpenMP calls that somewhere a team, which might be a thread on the CPU or maying a CUDA threadblock or OpenCL workgroup. It will choose how many teams to create based on where you're running, only a few on a CPU (like 1 per CPU core) or lots on a GPU (1000's possibly). ```teams``` allow OpenMP code to scale from small CPUs to large GPUs because each one works completely independently of each other ```teams```.\n",
			
 
				     "\n",
			
 
				     "<img src=\"../images/openmp_target_teams.png\">\n",
			
 
				     "\n",
			
--- a/hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/stdpar/nways_stdpar.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/C/jupyter_notebook/stdpar/nways_stdpar.ipynb
@@ -160,7 +160,10 @@
 
				    "outputs": [],
			
 
				    "source": [
			
 
				     "#Compile the code for muticore\n",
			
 
				-    "!cd ../../source_code/stdpar && nvc++ -std=c++17 -stdpar=multicore -o rdf rdf.cpp -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -ltbb"
			
 
				+    "!cd ../../source_code/stdpar && nvc++ -std=c++17 -stdpar=multicore \\\n",
			
 
				+    "-I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include \\\n",
			
 
				+    "-o rdf rdf.cpp -fopenmp \\\n",
			
 
				+    "-L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -232,7 +235,7 @@
 
				    "outputs": [],
			
 
				    "source": [
			
 
				     "#compile for Tesla GPU\n",
			
 
				-    "!cd ../../source_code/stdpar && nvc++ -std=c++17 -DUSE_COUNTING_ITERATIOR  -stdpar=gpu -o rdf rdf.cpp "
			
 
				+    "!cd ../../source_code/stdpar && nvc++ -std=c++17 -DUSE_COUNTING_ITERATOR  -stdpar=gpu -o rdf rdf.cpp "
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -378,7 +381,7 @@
 
				    "name": "python",
			
 
				    "nbconvert_exporter": "python",
			
 
				    "pygments_lexer": "ipython3",
			
 
				-   "version": "3.6.2"
			
 
				+   "version": "3.7.4"
			
 
				   }
			
 
				  },
			
 
				  "nbformat": 4,
			
--- a/hpc/nways/nways_labs/nways_MD/English/C/source_code/cudac/SOLUTION/rdf_malloc.cu
+++ b/hpc/nways/nways_labs/nways_MD/English/C/source_code/cudac/SOLUTION/rdf_malloc.cu
@@ -1,4 +1,3 @@
 
				-// Copyright (c) 2021 NVIDIA Corporation.  All rights reserved. 
			
 
				 #include <stdio.h>
			
 
				 #include <iostream>
			
 
				 #include <fstream>
			
@@ -13,242 +12,210 @@
 
				 
			
 
				 using namespace std;
			
 
				 //additional error handling code
			
 
				-static void HandleError( cudaError_t err,
			
 
				-		const char *file,
			
 
				-		int line ) {
			
 
				-	if (err != cudaSuccess) {
			
 
				-		printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
			
 
				-				file, line );
			
 
				-		exit( EXIT_FAILURE );
			
 
				-	}
			
 
				+static void HandleError(cudaError_t err,
			
 
				+    const char *file, int line) {
			
 
				+    if (err != cudaSuccess) {
			
 
				+        printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
			
 
				+        file, line );
			
 
				+        exit( EXIT_FAILURE );
			
 
				+   }
			
 
				 }
			
 
				 #define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
			
 
				 
			
 
				 //declaration of GPU function
			
 
				-__global__ void pair_gpu(const double* d_x, const double* d_y, const double* d_z,  unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				-		const double xbox, const double ybox, const double zbox,  int d_bin,  unsigned long long int bl);
			
 
				+__global__ void pair_gpu(const double* d_x, const double* d_y, const double* d_z,
			
 
				+                         unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				+                         double xbox, double ybox, double zbox, int d_bin);
			
 
				 
			
 
				-int main(int argc , char* argv[] )
			
 
				+int main(int argc , char* argv[])
			
 
				 {
			
 
				-	double xbox,ybox,zbox;
			
 
				-	double* h_x,*h_y,*h_z;
			
 
				-	double* d_x,*d_y,*d_z;
			
 
				-	unsigned long long int *h_g2,*d_g2;
			
 
				-	int nbin;
			
 
				-	int nthreads,device;
			
 
				-	int numatm,nconf,inconf;
			
 
				-	unsigned long long int near2;
			
 
				-	string file;
			
 
				-
			
 
				-	///////////////////////////////////////////////////////////////
			
 
				-
			
 
				-	inconf = 10;
			
 
				-	nbin=2000;
			
 
				-	file = "../input/alk.traj.dcd";
			
 
				-	device = 0;
			
 
				-	nthreads = 128;
			
 
				-	HANDLE_ERROR (cudaSetDevice(device));//pick the device to use
			
 
				-	///////////////////////////////////////
			
 
				-	std::ifstream infile;
			
 
				-	infile.open(file.c_str());
			
 
				-	if(!infile){
			
 
				-		cout<<"file "<<file.c_str()<<" not found\n";
			
 
				-		return 1;
			
 
				-	}
			
 
				-	assert(infile);
			
 
				-
			
 
				-
			
 
				-	ofstream pairfile,stwo;
			
 
				-	pairfile.open("RDF.dat");
			
 
				-	stwo.open("Pair_entropy.dat");
			
 
				-
			
 
				-	/////////////////////////////////////////////////////////
			
 
				-	dcdreadhead(&numatm,&nconf,infile);
			
 
				-	cout<<"Dcd file has "<< numatm << " atoms and " << nconf << " frames"<<endl;
			
 
				-	if (inconf>nconf) cout << "nconf is reset to "<< nconf <<endl;
			
 
				-	else
			
 
				-	{nconf=inconf;}
			
 
				-	cout<<"Calculating RDF for " << nconf << " frames"<<endl;
			
 
				-	////////////////////////////////////////////////////////
			
 
				-
			
 
				-	unsigned long long int sizef= nconf*numatm*sizeof(double);
			
 
				-	unsigned long long int sizebin= nbin*sizeof(unsigned long long int);
			
 
				-
			
 
				-	//Allocate memory on CPU
			
 
				-	HANDLE_ERROR(cudaHostAlloc((void **)&h_x, sizef, cudaHostAllocDefault));
			
 
				-	HANDLE_ERROR(cudaHostAlloc((void **)&h_y, sizef, cudaHostAllocDefault));
			
 
				-	HANDLE_ERROR(cudaHostAlloc((void **)&h_z, sizef, cudaHostAllocDefault));
			
 
				-	HANDLE_ERROR(cudaHostAlloc((void **)&h_g2, sizebin, cudaHostAllocDefault));
			
 
				-
			
 
				-	//Allocate memory on GPU
			
 
				-	HANDLE_ERROR(cudaMalloc((void**)&d_x, sizef));
			
 
				-	HANDLE_ERROR(cudaMalloc((void**)&d_y, sizef));
			
 
				-	HANDLE_ERROR(cudaMalloc((void**)&d_z, sizef));
			
 
				-	HANDLE_ERROR(cudaMalloc((void**)&d_g2, sizebin));
			
 
				-
			
 
				-	HANDLE_ERROR (cudaPeekAtLastError());
			
 
				-
			
 
				-	memset(h_g2,0,sizebin);
			
 
				-
			
 
				-
			
 
				-
			
 
				-	/////////reading cordinates//////////////////////////////////////////////
			
 
				-	nvtxRangePush("Read_File");
			
 
				-	double ax[numatm],ay[numatm],az[numatm];
			
 
				-	for (int i=0;i<nconf;i++) {
			
 
				-		dcdreadframe(ax,ay,az,infile,numatm,xbox,ybox,zbox);
			
 
				-		for (int j=0;j<numatm;j++){
			
 
				-			h_x[i*numatm+j]=ax[j];
			
 
				-			h_y[i*numatm+j]=ay[j];
			
 
				-			h_z[i*numatm+j]=az[j];
			
 
				-		}
			
 
				-	}
			
 
				-	nvtxRangePop(); //pop for REading file
			
 
				-
			
 
				-
			
 
				-	nvtxRangePush("Pair_Calculation");
			
 
				-	//Copy the data from Host to Device before calculation on GPU
			
 
				-	HANDLE_ERROR(cudaMemcpy(d_g2, h_g2, sizebin,cudaMemcpyHostToDevice));
			
 
				-	HANDLE_ERROR(cudaMemcpy(d_x, h_x, sizef, cudaMemcpyHostToDevice));
			
 
				-	HANDLE_ERROR(cudaMemcpy(d_y, h_y, sizef, cudaMemcpyHostToDevice));
			
 
				-	HANDLE_ERROR(cudaMemcpy(d_z, h_z, sizef, cudaMemcpyHostToDevice));
			
 
				-
			
 
				-	cout<<"Reading of input file and transfer to gpu is completed"<<endl;
			
 
				-	//////////////////////////////////////////////////////////////////////////
			
 
				-
			
 
				-	near2=nthreads*(int(0.5*numatm*(numatm-1)/nthreads)+1);
			
 
				-	unsigned long long int nblock = (near2/nthreads);
			
 
				-
			
 
				-	cout<<"Initial blocks are "<<nblock<<" "<<", now changing to ";
			
 
				-
			
 
				-	int maxblock=65535;
			
 
				-	int bl;
			
 
				-	int blockloop= int(nblock/maxblock);
			
 
				-	if (blockloop != 0) {
			
 
				-		nblock=maxblock;
			
 
				-	}
			
 
				-	cout<<nblock<<" and will run over "<<(blockloop+1)<<" blockloops"<<endl;
			
 
				-
			
 
				-	for (bl=0;bl<(blockloop+1);bl++) {
			
 
				-		//cout <<bl<<endl;
			
 
				-		pair_gpu<<< nblock,nthreads >>>
			
 
				-			(d_x, d_y, d_z, d_g2, numatm, nconf, xbox, ybox, zbox, nbin, bl);
			
 
				-
			
 
				-		HANDLE_ERROR (cudaPeekAtLastError());
			
 
				-		HANDLE_ERROR(cudaDeviceSynchronize());
			
 
				-	}
			
 
				-
			
 
				-	HANDLE_ERROR(cudaMemcpy(h_g2, d_g2, sizebin, cudaMemcpyDeviceToHost));
			
 
				-
			
 
				-	nvtxRangePop(); //Pop for Pair Calculation
			
 
				-
			
 
				-	double pi=acos(-1.0l);
			
 
				-	double rho=(numatm)/(xbox*ybox*zbox);
			
 
				-	double norm=(4.0l*pi*rho)/3.0l;
			
 
				-	double rl,ru,nideal;
			
 
				-	double g2[nbin];
			
 
				-	double r,gr,lngr,lngrbond,s2=0.0l,s2bond=0.0l;
			
 
				-	double box=min(xbox,ybox);
			
 
				-	box=min(box,zbox);
			
 
				-	double del=box/(2.0l*nbin);
			
 
				-	nvtxRangePush("Entropy_Calculation");
			
 
				-	for (int i=0;i<nbin;i++) {
			
 
				-		//      cout<<i+1<<" "<<h_g2[i]<<endl;
			
 
				-		rl=(i)*del;
			
 
				-		ru=rl+del;
			
 
				-		nideal=norm*(ru*ru*ru-rl*rl*rl);
			
 
				-		g2[i]=(double)h_g2[i]/((double)nconf*(double)numatm*nideal);
			
 
				-		r=(i)*del;
			
 
				-		pairfile<<(i+0.5l)*del<<" "<<g2[i]<<endl;
			
 
				-		if (r<2.0l) {
			
 
				-			gr=0.0l;
			
 
				-		}
			
 
				-		else {
			
 
				-			gr=g2[i];
			
 
				-		}
			
 
				-		if (gr<1e-5) {
			
 
				-			lngr=0.0l;
			
 
				-		}
			
 
				-		else {
			
 
				-			lngr=log(gr);
			
 
				-		}
			
 
				-
			
 
				-		if (g2[i]<1e-6) {
			
 
				-			lngrbond=0.0l;
			
 
				-		}
			
 
				-		else {
			
 
				-			lngrbond=log(g2[i]);
			
 
				-		}
			
 
				-		s2=s2-2.0l*pi*rho*((gr*lngr)-gr+1.0l)*del*r*r;
			
 
				-		s2bond=s2bond-2.0l*pi*rho*((g2[i]*lngrbond)-g2[i]+1.0l)*del*r*r;
			
 
				-
			
 
				-	}
			
 
				-	nvtxRangePop(); //Pop for Entropy Calculation
			
 
				-	stwo<<"s2 value is "<<s2<<endl;
			
 
				-	stwo<<"s2bond value is "<<s2bond<<endl;
			
 
				-
			
 
				-
			
 
				-
			
 
				-	cout<<"\n\n\n#Freeing Device memory"<<endl;
			
 
				-	HANDLE_ERROR(cudaFree(d_x));
			
 
				-	HANDLE_ERROR(cudaFree(d_y));
			
 
				-	HANDLE_ERROR(cudaFree(d_z));
			
 
				-	HANDLE_ERROR(cudaFree(d_g2));
			
 
				-
			
 
				-	cout<<"#Freeing Host memory"<<endl;
			
 
				-	HANDLE_ERROR(cudaFreeHost ( h_x ) );
			
 
				-	HANDLE_ERROR(cudaFreeHost ( h_y ) );
			
 
				-	HANDLE_ERROR(cudaFreeHost ( h_z ) );
			
 
				-	HANDLE_ERROR(cudaFreeHost ( h_g2 ) );
			
 
				-
			
 
				-	cout<<"#Number of atoms processed: "<<numatm<<endl<<endl;
			
 
				-	cout<<"#Number of confs processed: "<<nconf<<endl<<endl;
			
 
				-	cout<<"#number of threads used: "<<nthreads<<endl<<endl;
			
 
				-	return 0;
			
 
				+    double xbox,ybox,zbox;
			
 
				+    double* h_x,*h_y,*h_z;
			
 
				+    double* d_x,*d_y,*d_z;
			
 
				+    unsigned long long int *h_g2,*d_g2;
			
 
				+    int nbin;
			
 
				+    int device;
			
 
				+    int numatm,nconf,inconf;
			
 
				+    string file;
			
 
				+
			
 
				+    ///////////////////////////////////////////////////////////////
			
 
				+
			
 
				+    inconf = 10;
			
 
				+    nbin = 2000;
			
 
				+    file = "../input/alk.traj.dcd";
			
 
				+    device = 0;
			
 
				+    HANDLE_ERROR (cudaSetDevice(device));//pick the device to use
			
 
				+    ///////////////////////////////////////
			
 
				+    std::ifstream infile;
			
 
				+    infile.open(file.c_str());
			
 
				+    if(!infile){
			
 
				+        cout<<"file "<<file.c_str()<<" not found\n";
			
 
				+        return 1;
			
 
				+    }
			
 
				+    assert(infile);
			
 
				+
			
 
				+
			
 
				+    ofstream pairfile,stwo;
			
 
				+    pairfile.open("RDF.dat");
			
 
				+    stwo.open("Pair_entropy.dat");
			
 
				+
			
 
				+    /////////////////////////////////////////////////////////
			
 
				+    dcdreadhead(&numatm,&nconf,infile);
			
 
				+    cout<<"Dcd file has "<< numatm << " atoms and " << nconf << " frames"<<endl;
			
 
				+    if (inconf>nconf) cout << "nconf is reset to "<< nconf <<endl;
			
 
				+    else {nconf = inconf;}
			
 
				+    cout<<"Calculating RDF for " << nconf << " frames"<<endl;
			
 
				+    ////////////////////////////////////////////////////////
			
 
				+
			
 
				+    unsigned long long int sizef= nconf*numatm*sizeof(double);
			
 
				+    unsigned long long int sizebin= nbin*sizeof(unsigned long long int);
			
 
				+
			
 
				+    //Allocate memory on CPU
			
 
				+    HANDLE_ERROR(cudaHostAlloc((void **)&h_x, sizef, cudaHostAllocDefault));
			
 
				+    HANDLE_ERROR(cudaHostAlloc((void **)&h_y, sizef, cudaHostAllocDefault));
			
 
				+    HANDLE_ERROR(cudaHostAlloc((void **)&h_z, sizef, cudaHostAllocDefault));
			
 
				+    HANDLE_ERROR(cudaHostAlloc((void **)&h_g2, sizebin, cudaHostAllocDefault));
			
 
				+
			
 
				+    //Allocate memory on GPU
			
 
				+    HANDLE_ERROR(cudaMalloc((void**)&d_x, sizef));
			
 
				+    HANDLE_ERROR(cudaMalloc((void**)&d_y, sizef));
			
 
				+    HANDLE_ERROR(cudaMalloc((void**)&d_z, sizef));
			
 
				+    HANDLE_ERROR(cudaMalloc((void**)&d_g2, sizebin));
			
 
				+
			
 
				+    HANDLE_ERROR (cudaPeekAtLastError());
			
 
				+
			
 
				+    memset(h_g2,0,sizebin);
			
 
				+
			
 
				+    /////////reading cordinates//////////////////////////////////////////////
			
 
				+    nvtxRangePush("Read_File");
			
 
				+    double ax[numatm],ay[numatm],az[numatm];
			
 
				+    for (int i=0;i<nconf;i++) {
			
 
				+        dcdreadframe(ax,ay,az,infile,numatm,xbox,ybox,zbox);
			
 
				+        for (int j=0;j<numatm;j++){
			
 
				+            h_x[i*numatm+j]=ax[j];
			
 
				+            h_y[i*numatm+j]=ay[j];
			
 
				+            h_z[i*numatm+j]=az[j];
			
 
				+        }
			
 
				+    }
			
 
				+    nvtxRangePop(); //pop for Reading file
			
 
				+
			
 
				+    nvtxRangePush("Pair_Calculation");
			
 
				+    //Copy the data from Host to Device before calculation on GPU
			
 
				+    HANDLE_ERROR(cudaMemcpy(d_g2, h_g2, sizebin,cudaMemcpyHostToDevice));
			
 
				+    HANDLE_ERROR(cudaMemcpy(d_x, h_x, sizef, cudaMemcpyHostToDevice));
			
 
				+    HANDLE_ERROR(cudaMemcpy(d_y, h_y, sizef, cudaMemcpyHostToDevice));
			
 
				+    HANDLE_ERROR(cudaMemcpy(d_z, h_z, sizef, cudaMemcpyHostToDevice));
			
 
				+
			
 
				+    cout<<"Reading of input file and transfer to gpu is completed"<<endl;
			
 
				+    //////////////////////////////////////////////////////////////////////////
			
 
				+    dim3 nthreads(128, 1, 1);
			
 
				+    dim3 nblock;
			
 
				+    nblock.x = (numatm + nthreads.x - 1)/nthreads.x;
			
 
				+    nblock.y = (numatm + nthreads.y - 1)/nthreads.y;
			
 
				+    nblock.z = 1;
			
 
				+    pair_gpu<<<nblock, nthreads>>>
			
 
				+        (d_x, d_y, d_z, d_g2, numatm, nconf, xbox, ybox, zbox, nbin);
			
 
				+
			
 
				+    HANDLE_ERROR (cudaPeekAtLastError());
			
 
				+    HANDLE_ERROR(cudaDeviceSynchronize());
			
 
				+
			
 
				+    HANDLE_ERROR(cudaMemcpy(h_g2, d_g2, sizebin, cudaMemcpyDeviceToHost));
			
 
				+
			
 
				+    nvtxRangePop(); //Pop for Pair Calculation
			
 
				+
			
 
				+    double pi=acos(-1.0l);
			
 
				+    double rho=(numatm)/(xbox*ybox*zbox);
			
 
				+    double norm=(4.0l*pi*rho)/3.0l;
			
 
				+    double rl,ru,nideal;
			
 
				+    double g2[nbin];
			
 
				+    double r,gr,lngr,lngrbond,s2=0.0l,s2bond=0.0l;
			
 
				+    double box=min(xbox,ybox);
			
 
				+    box=min(box,zbox);
			
 
				+    double del=box/(2.0l*nbin);
			
 
				+    nvtxRangePush("Entropy_Calculation");
			
 
				+    for (int i=0;i<nbin;i++) {
			
 
				+        rl=(i)*del;
			
 
				+        ru=rl+del;
			
 
				+        nideal=norm*(ru*ru*ru-rl*rl*rl);
			
 
				+        g2[i]=(double)h_g2[i]/((double)nconf*(double)numatm*nideal);
			
 
				+        r=(i)*del;
			
 
				+        pairfile<<(i+0.5l)*del<<" "<<g2[i]<<endl;
			
 
				+        if (r<2.0l) {
			
 
				+            gr=0.0l;
			
 
				+        }
			
 
				+        else {
			
 
				+            gr=g2[i];
			
 
				+        }
			
 
				+        if (gr<1e-5) {
			
 
				+            lngr=0.0l;
			
 
				+        }
			
 
				+        else {
			
 
				+            lngr=log(gr);
			
 
				+        }
			
 
				+
			
 
				+        if (g2[i]<1e-6) {
			
 
				+            lngrbond=0.0l;
			
 
				+        }
			
 
				+        else {
			
 
				+            lngrbond=log(g2[i]);
			
 
				+        }
			
 
				+        s2=s2-2.0l*pi*rho*((gr*lngr)-gr+1.0l)*del*r*r;
			
 
				+        s2bond=s2bond-2.0l*pi*rho*((g2[i]*lngrbond)-g2[i]+1.0l)*del*r*r;
			
 
				+
			
 
				+    }
			
 
				+    nvtxRangePop(); //Pop for Entropy Calculation
			
 
				+    stwo<<"s2 value is "<<s2<<endl;
			
 
				+    stwo<<"s2bond value is "<<s2bond<<endl;
			
 
				+
			
 
				+    cout<<"\n\n\n#Freeing Device memory"<<endl;
			
 
				+    HANDLE_ERROR(cudaFree(d_x));
			
 
				+    HANDLE_ERROR(cudaFree(d_y));
			
 
				+    HANDLE_ERROR(cudaFree(d_z));
			
 
				+    HANDLE_ERROR(cudaFree(d_g2));
			
 
				+ 
			
 
				+    cout<<"#Freeing Host memory"<<endl;
			
 
				+    HANDLE_ERROR(cudaFreeHost(h_x));
			
 
				+    HANDLE_ERROR(cudaFreeHost(h_y));
			
 
				+    HANDLE_ERROR(cudaFreeHost(h_z));
			
 
				+    HANDLE_ERROR(cudaFreeHost(h_g2));
			
 
				+
			
 
				+    cout<<"#Number of atoms processed: "<<numatm<<endl<<endl;
			
 
				+    cout<<"#Number of confs processed: "<<nconf<<endl<<endl;
			
 
				+    return 0;
			
 
				 }
			
 
				 
			
 
				-__global__ void pair_gpu(
			
 
				-		const double* d_x, const double* d_y, const double* d_z, 
			
 
				-		unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				-		const double xbox,const double ybox,const double zbox,int d_bin,  unsigned long long int bl)
			
 
				+__global__ void pair_gpu(const double* d_x, const double* d_y, const double* d_z,
			
 
				+    unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				+    double xbox, double ybox, double zbox, int d_bin)
			
 
				 {
			
 
				-	double r,cut,dx,dy,dz;
			
 
				-	int ig2,id1,id2;
			
 
				-	double box;
			
 
				-	box=min(xbox,ybox);
			
 
				-	box=min(box,zbox);
			
 
				-
			
 
				-	double del=box/(2.0*d_bin);
			
 
				-	cut=box*0.5;
			
 
				-	int thisi;
			
 
				-	double n;
			
 
				-
			
 
				-	int i = blockIdx.x * blockDim.x + threadIdx.x;
			
 
				-	int maxi = min(int(0.5*numatm*(numatm-1)-(bl*65535*128)),(65535*128));
			
 
				-
			
 
				-	if ( i < maxi ) {
			
 
				-		thisi=bl*65535*128+i;
			
 
				-
			
 
				-		n=(0.5)*(1+ ((double) sqrt (1.0+4.0*2.0*thisi)));
			
 
				-		id1=int(n);
			
 
				-		id2=thisi-(0.5*id1*(id1-1));
			
 
				-
			
 
				-		for (int frame=0;frame<nconf;frame++){
			
 
				-			dx=d_x[frame*numatm+id1]-d_x[frame*numatm+id2];
			
 
				-			dy=d_y[frame*numatm+id1]-d_y[frame*numatm+id2];
			
 
				-			dz=d_z[frame*numatm+id1]-d_z[frame*numatm+id2];
			
 
				-
			
 
				-			dx=dx-xbox*(round(dx/xbox));
			
 
				-			dy=dy-ybox*(round(dy/ybox));
			
 
				-			dz=dz-zbox*(round(dz/zbox));
			
 
				-
			
 
				-			r=sqrtf(dx*dx+dy*dy+dz*dz);
			
 
				-			if (r<cut) {
			
 
				-				ig2=(int)(r/del);
			
 
				-				atomicAdd(&d_g2[ig2],2) ;
			
 
				-			}
			
 
				-		}
			
 
				-	}
			
 
				+    double r, cut, dx, dy, dz;
			
 
				+    int ig2;
			
 
				+    double box;
			
 
				+    box = min(xbox, ybox);
			
 
				+    box = min(box, zbox);
			
 
				+
			
 
				+    double del = box / (2.0 * d_bin);
			
 
				+    cut = box * 0.5;
			
 
				+
			
 
				+    int id1 = blockIdx.y * blockDim.y + threadIdx.y;
			
 
				+    int id2 = blockIdx.x * blockDim.x + threadIdx.x;
			
 
				+
			
 
				+    if (id1 >= numatm || id2 >= numatm) return;
			
 
				+    if (id1 > id2) return;
			
 
				+
			
 
				+    for (int frame = 0; frame < nconf; ++frame) {
			
 
				+        dx = d_x[frame * numatm + id1] - d_x[frame * numatm + id2];
			
 
				+        dy = d_y[frame * numatm + id1] - d_y[frame * numatm + id2];
			
 
				+        dz = d_z[frame * numatm + id1] - d_z[frame * numatm + id2];
			
 
				+
			
 
				+        dx = dx - xbox * (round(dx / xbox));
			
 
				+        dy = dy - ybox * (round(dy / ybox));
			
 
				+        dz = dz - zbox * (round(dz / zbox));
			
 
				+
			
 
				+        r = sqrtf(dx * dx + dy * dy + dz * dz);
			
 
				+        if (r < cut) {
			
 
				+            ig2 = (int)(r / del);
			
 
				+            atomicAdd(&d_g2[ig2], 2);
			
 
				+        }
			
 
				+    }
			
 
				 }
			
 
				-
			
 
				-
			
--- a/hpc/nways/nways_labs/nways_MD/English/C/source_code/cudac/SOLUTION/rdf_unified_memory.cu
+++ b/hpc/nways/nways_labs/nways_MD/English/C/source_code/cudac/SOLUTION/rdf_unified_memory.cu
@@ -12,218 +12,192 @@
 
				 
			
 
				 using namespace std;
			
 
				 //additional error handling code
			
 
				-static void HandleError( cudaError_t err,
			
 
				-		const char *file,
			
 
				-		int line ) {
			
 
				-	if (err != cudaSuccess) {
			
 
				-		printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
			
 
				-				file, line );
			
 
				-		exit( EXIT_FAILURE );
			
 
				-	}
			
 
				+static void HandleError(cudaError_t err,
			
 
				+    const char *file, int line) {
			
 
				+    if (err != cudaSuccess) {
			
 
				+        printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
			
 
				+        file, line );
			
 
				+        exit( EXIT_FAILURE );
			
 
				+   }
			
 
				 }
			
 
				 #define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
			
 
				 
			
 
				 //declaration of GPU function
			
 
				-__global__ void pair_gpu(const double* d_x, const double* d_y, const double* d_z,  unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				-		const double xbox, const double ybox, const double zbox,  int d_bin,  unsigned long long int bl);
			
 
				+__global__ void pair_gpu(const double* d_x, const double* d_y, const double* d_z,
			
 
				+                         unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				+                         double xbox, double ybox, double zbox, int d_bin);
			
 
				 
			
 
				-int main(int argc , char* argv[] )
			
 
				+int main(int argc , char* argv[])
			
 
				 {
			
 
				-	double xbox,ybox,zbox;
			
 
				-	double* d_x,*d_y,*d_z;
			
 
				-	unsigned long long int *d_g2;
			
 
				-	int nbin;
			
 
				-	int nthreads,device;
			
 
				-	int numatm,nconf,inconf;
			
 
				-	unsigned long long int near2;
			
 
				-	string file;
			
 
				-
			
 
				-	///////////////////////////////////////////////////////////////
			
 
				-
			
 
				-	inconf = 10;
			
 
				-	nbin=2000;
			
 
				-	file = "../input/alk.traj.dcd";
			
 
				-	device = 0;
			
 
				-	nthreads = 128;
			
 
				-	HANDLE_ERROR (cudaSetDevice(device));//pick the device to use
			
 
				-	///////////////////////////////////////
			
 
				-	std::ifstream infile;
			
 
				-	infile.open(file.c_str());
			
 
				-	if(!infile){
			
 
				-		cout<<"file "<<file.c_str()<<" not found\n";
			
 
				-		return 1;
			
 
				-	}
			
 
				-	assert(infile);
			
 
				-
			
 
				-
			
 
				-	ofstream pairfile,stwo;
			
 
				-	pairfile.open("RDF.dat");
			
 
				-	stwo.open("Pair_entropy.dat");
			
 
				-
			
 
				-	/////////////////////////////////////////////////////////
			
 
				-	dcdreadhead(&numatm,&nconf,infile);
			
 
				-	cout<<"Dcd file has "<< numatm << " atoms and " << nconf << " frames"<<endl;
			
 
				-	if (inconf>nconf) cout << "nconf is reset to "<< nconf <<endl;
			
 
				-	else
			
 
				-	{nconf=inconf;}
			
 
				-	cout<<"Calculating RDF for " << nconf << " frames"<<endl;
			
 
				-	////////////////////////////////////////////////////////
			
 
				-
			
 
				-	unsigned long long int sizef= nconf*numatm*sizeof(double);
			
 
				-	unsigned long long int sizebin= nbin*sizeof(unsigned long long int);
			
 
				-
			
 
				-	// Allocate Unified Memory -- accessible from CPU or GPU
			
 
				-	cudaMallocManaged(&d_x, sizef);
			
 
				-	cudaMallocManaged(&d_y, sizef);
			
 
				-	cudaMallocManaged(&d_z, sizef);
			
 
				-	cudaMallocManaged(&d_g2, sizebin);
			
 
				-
			
 
				-	HANDLE_ERROR (cudaPeekAtLastError());
			
 
				-
			
 
				-	memset(d_g2,0,sizebin);
			
 
				-
			
 
				-	/////////reading cordinates//////////////////////////////////////////////
			
 
				-	nvtxRangePush("Read_File");
			
 
				-	double ax[numatm],ay[numatm],az[numatm];
			
 
				-	for (int i=0;i<nconf;i++) {
			
 
				-		dcdreadframe(ax,ay,az,infile,numatm,xbox,ybox,zbox);
			
 
				-		for (int j=0;j<numatm;j++){
			
 
				-			d_x[i*numatm+j]=ax[j];
			
 
				-			d_y[i*numatm+j]=ay[j];
			
 
				-			d_z[i*numatm+j]=az[j];
			
 
				-		}
			
 
				-	}
			
 
				-	nvtxRangePop(); //pop for Reading file
			
 
				-
			
 
				-	nvtxRangePush("Pair_Calculation");
			
 
				-
			
 
				-	cout<<"Reading of input file and transfer to gpu is completed"<<endl;
			
 
				-	//////////////////////////////////////////////////////////////////////////
			
 
				-
			
 
				-	near2=nthreads*(int(0.5*numatm*(numatm-1)/nthreads)+1);
			
 
				-	unsigned long long int nblock = (near2/nthreads);
			
 
				-
			
 
				-	cout<<"Initial blocks are "<<nblock<<" "<<", now changing to ";
			
 
				-
			
 
				-	int maxblock=65535;
			
 
				-	int bl;
			
 
				-	int blockloop= int(nblock/maxblock);
			
 
				-	if (blockloop != 0) {
			
 
				-		nblock=maxblock;
			
 
				-	}
			
 
				-	cout<<nblock<<" and will run over "<<(blockloop+1)<<" blockloops"<<endl;
			
 
				-
			
 
				-	for (bl=0;bl<(blockloop+1);bl++) {
			
 
				-		//cout <<bl<<endl;
			
 
				-		pair_gpu<<< nblock,nthreads >>>
			
 
				-			(d_x, d_y, d_z, d_g2, numatm, nconf, xbox, ybox, zbox, nbin, bl);
			
 
				-
			
 
				-		HANDLE_ERROR (cudaPeekAtLastError());
			
 
				-		HANDLE_ERROR(cudaDeviceSynchronize());
			
 
				-	}
			
 
				-
			
 
				-	nvtxRangePop(); //Pop for Pair Calculation
			
 
				-
			
 
				-	double pi=acos(-1.0l);
			
 
				-	double rho=(numatm)/(xbox*ybox*zbox);
			
 
				-	double norm=(4.0l*pi*rho)/3.0l;
			
 
				-	double rl,ru,nideal;
			
 
				-	double g2[nbin];
			
 
				-	double r,gr,lngr,lngrbond,s2=0.0l,s2bond=0.0l;
			
 
				-	double box=min(xbox,ybox);
			
 
				-	box=min(box,zbox);
			
 
				-	double del=box/(2.0l*nbin);
			
 
				-	nvtxRangePush("Entropy_Calculation");
			
 
				-	for (int i=0;i<nbin;i++) {
			
 
				-		//      cout<<i+1<<" "<<d_g2[i]<<endl;
			
 
				-		rl=(i)*del;
			
 
				-		ru=rl+del;
			
 
				-		nideal=norm*(ru*ru*ru-rl*rl*rl);
			
 
				-		g2[i]=(double)d_g2[i]/((double)nconf*(double)numatm*nideal);
			
 
				-		r=(i)*del;
			
 
				-		pairfile<<(i+0.5l)*del<<" "<<g2[i]<<endl;
			
 
				-		if (r<2.0l) {
			
 
				-			gr=0.0l;
			
 
				-		}
			
 
				-		else {
			
 
				-			gr=g2[i];
			
 
				-		}
			
 
				-		if (gr<1e-5) {
			
 
				-			lngr=0.0l;
			
 
				-		}
			
 
				-		else {
			
 
				-			lngr=log(gr);
			
 
				-		}
			
 
				-
			
 
				-		if (g2[i]<1e-6) {
			
 
				-			lngrbond=0.0l;
			
 
				-		}
			
 
				-		else {
			
 
				-			lngrbond=log(g2[i]);
			
 
				-		}
			
 
				-		s2=s2-2.0l*pi*rho*((gr*lngr)-gr+1.0l)*del*r*r;
			
 
				-		s2bond=s2bond-2.0l*pi*rho*((g2[i]*lngrbond)-g2[i]+1.0l)*del*r*r;
			
 
				-
			
 
				-	}
			
 
				-	nvtxRangePop(); //Pop for Entropy Calculation
			
 
				-	stwo<<"s2 value is "<<s2<<endl;
			
 
				-	stwo<<"s2bond value is "<<s2bond<<endl;
			
 
				-
			
 
				-	cout<<"#Freeing memory"<<endl;
			
 
				-	  // Free memory
			
 
				-	HANDLE_ERROR(cudaFree(d_x));
			
 
				-	HANDLE_ERROR(cudaFree(d_y));
			
 
				-	HANDLE_ERROR(cudaFree(d_z));
			
 
				-	HANDLE_ERROR(cudaFree(d_g2));
			
 
				-
			
 
				-	cout<<"#Number of atoms processed: "<<numatm<<endl<<endl;
			
 
				-	cout<<"#Number of confs processed: "<<nconf<<endl<<endl;
			
 
				-	cout<<"#number of threads used: "<<nthreads<<endl<<endl;
			
 
				-	return 0;
			
 
				+    double xbox,ybox,zbox;
			
 
				+    double* d_x,*d_y,*d_z;
			
 
				+    unsigned long long int *d_g2;
			
 
				+    int nbin;
			
 
				+    int device;
			
 
				+    int numatm,nconf,inconf;
			
 
				+    string file;
			
 
				+
			
 
				+    ///////////////////////////////////////////////////////////////
			
 
				+
			
 
				+    inconf = 10;
			
 
				+    nbin = 2000;
			
 
				+    file = "../input/alk.traj.dcd";
			
 
				+    device = 0;
			
 
				+    HANDLE_ERROR (cudaSetDevice(device));//pick the device to use
			
 
				+    ///////////////////////////////////////
			
 
				+    std::ifstream infile;
			
 
				+    infile.open(file.c_str());
			
 
				+    if(!infile){
			
 
				+        cout<<"file "<<file.c_str()<<" not found\n";
			
 
				+        return 1;
			
 
				+    }
			
 
				+    assert(infile);
			
 
				+
			
 
				+
			
 
				+    ofstream pairfile,stwo;
			
 
				+    pairfile.open("RDF.dat");
			
 
				+    stwo.open("Pair_entropy.dat");
			
 
				+
			
 
				+    /////////////////////////////////////////////////////////
			
 
				+    dcdreadhead(&numatm,&nconf,infile);
			
 
				+    cout<<"Dcd file has "<< numatm << " atoms and " << nconf << " frames"<<endl;
			
 
				+    if (inconf>nconf) cout << "nconf is reset to "<< nconf <<endl;
			
 
				+    else {nconf = inconf;}
			
 
				+    cout<<"Calculating RDF for " << nconf << " frames"<<endl;
			
 
				+    ////////////////////////////////////////////////////////
			
 
				+
			
 
				+    unsigned long long int sizef= nconf*numatm*sizeof(double);
			
 
				+    unsigned long long int sizebin= nbin*sizeof(unsigned long long int);
			
 
				+
			
 
				+    // Allocate Unified Memory -- accessible from CPU or GPU
			
 
				+    cudaMallocManaged(&d_x, sizef);
			
 
				+    cudaMallocManaged(&d_y, sizef);
			
 
				+    cudaMallocManaged(&d_z, sizef);
			
 
				+    cudaMallocManaged(&d_g2, sizebin);
			
 
				+
			
 
				+    HANDLE_ERROR (cudaPeekAtLastError());
			
 
				+
			
 
				+    memset(d_g2,0,sizebin);
			
 
				+
			
 
				+    /////////reading cordinates//////////////////////////////////////////////
			
 
				+    nvtxRangePush("Read_File");
			
 
				+    double ax[numatm],ay[numatm],az[numatm];
			
 
				+    for (int i=0;i<nconf;i++) {
			
 
				+        dcdreadframe(ax,ay,az,infile,numatm,xbox,ybox,zbox);
			
 
				+        for (int j=0;j<numatm;j++){
			
 
				+            d_x[i*numatm+j]=ax[j];
			
 
				+            d_y[i*numatm+j]=ay[j];
			
 
				+            d_z[i*numatm+j]=az[j];
			
 
				+        }
			
 
				+    }
			
 
				+    nvtxRangePop(); //pop for Reading file
			
 
				+
			
 
				+    nvtxRangePush("Pair_Calculation");
			
 
				+
			
 
				+    cout<<"Reading of input file and transfer to gpu is completed"<<endl;
			
 
				+    //////////////////////////////////////////////////////////////////////////
			
 
				+    dim3 nthreads(128, 1, 1);
			
 
				+    dim3 nblock;
			
 
				+    nblock.x = (numatm + nthreads.x - 1)/nthreads.x;
			
 
				+    nblock.y = (numatm + nthreads.y - 1)/nthreads.y;
			
 
				+    nblock.z = 1;
			
 
				+    pair_gpu<<<nblock, nthreads>>>
			
 
				+        (d_x, d_y, d_z, d_g2, numatm, nconf, xbox, ybox, zbox, nbin);
			
 
				+
			
 
				+    HANDLE_ERROR (cudaPeekAtLastError());
			
 
				+    HANDLE_ERROR(cudaDeviceSynchronize());
			
 
				+
			
 
				+
			
 
				+    nvtxRangePop(); //Pop for Pair Calculation
			
 
				+
			
 
				+    double pi=acos(-1.0l);
			
 
				+    double rho=(numatm)/(xbox*ybox*zbox);
			
 
				+    double norm=(4.0l*pi*rho)/3.0l;
			
 
				+    double rl,ru,nideal;
			
 
				+    double g2[nbin];
			
 
				+    double r,gr,lngr,lngrbond,s2=0.0l,s2bond=0.0l;
			
 
				+    double box=min(xbox,ybox);
			
 
				+    box=min(box,zbox);
			
 
				+    double del=box/(2.0l*nbin);
			
 
				+    nvtxRangePush("Entropy_Calculation");
			
 
				+    for (int i=0;i<nbin;i++) {
			
 
				+        rl=(i)*del;
			
 
				+        ru=rl+del;
			
 
				+        nideal=norm*(ru*ru*ru-rl*rl*rl);
			
 
				+        g2[i]=(double)d_g2[i]/((double)nconf*(double)numatm*nideal);
			
 
				+        r=(i)*del;
			
 
				+        pairfile<<(i+0.5l)*del<<" "<<g2[i]<<endl;
			
 
				+        if (r<2.0l) {
			
 
				+            gr=0.0l;
			
 
				+        }
			
 
				+        else {
			
 
				+            gr=g2[i];
			
 
				+        }
			
 
				+        if (gr<1e-5) {
			
 
				+            lngr=0.0l;
			
 
				+        }
			
 
				+        else {
			
 
				+            lngr=log(gr);
			
 
				+        }
			
 
				+
			
 
				+        if (g2[i]<1e-6) {
			
 
				+            lngrbond=0.0l;
			
 
				+        }
			
 
				+        else {
			
 
				+            lngrbond=log(g2[i]);
			
 
				+        }
			
 
				+        s2=s2-2.0l*pi*rho*((gr*lngr)-gr+1.0l)*del*r*r;
			
 
				+        s2bond=s2bond-2.0l*pi*rho*((g2[i]*lngrbond)-g2[i]+1.0l)*del*r*r;
			
 
				+
			
 
				+    }
			
 
				+    nvtxRangePop(); //Pop for Entropy Calculation
			
 
				+    stwo<<"s2 value is "<<s2<<endl;
			
 
				+    stwo<<"s2bond value is "<<s2bond<<endl;
			
 
				+
			
 
				+    cout<<"#Freeing memory"<<endl;
			
 
				+    // Free memory
			
 
				+    HANDLE_ERROR(cudaFree(d_x));
			
 
				+    HANDLE_ERROR(cudaFree(d_y));
			
 
				+    HANDLE_ERROR(cudaFree(d_z));
			
 
				+    HANDLE_ERROR(cudaFree(d_g2));
			
 
				+
			
 
				+    cout<<"#Number of atoms processed: "<<numatm<<endl<<endl;
			
 
				+    cout<<"#Number of confs processed: "<<nconf<<endl<<endl;
			
 
				+    return 0;
			
 
				 }
			
 
				 
			
 
				-__global__ void pair_gpu(
			
 
				-		const double* d_x, const double* d_y, const double* d_z, 
			
 
				-		unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				-		const double xbox,const double ybox,const double zbox,int d_bin,  unsigned long long int bl)
			
 
				+__global__ void pair_gpu(const double* d_x, const double* d_y, const double* d_z,
			
 
				+    unsigned long long int *d_g2, int numatm, int nconf, 
			
 
				+    double xbox, double ybox, double zbox, int d_bin)
			
 
				 {
			
 
				-	double r,cut,dx,dy,dz;
			
 
				-	int ig2,id1,id2;
			
 
				-	double box;
			
 
				-	box=min(xbox,ybox);
			
 
				-	box=min(box,zbox);
			
 
				-
			
 
				-	double del=box/(2.0*d_bin);
			
 
				-	cut=box*0.5;
			
 
				-	int thisi;
			
 
				-	double n;
			
 
				-
			
 
				-	int i = blockIdx.x * blockDim.x + threadIdx.x;
			
 
				-	int maxi = min(int(0.5*numatm*(numatm-1)-(bl*65535*128)),(65535*128));
			
 
				-
			
 
				-	if ( i < maxi ) {
			
 
				-		thisi=bl*65535*128+i;
			
 
				-
			
 
				-		n=(0.5)*(1+ ((double) sqrt (1.0+4.0*2.0*thisi)));
			
 
				-		id1=int(n);
			
 
				-		id2=thisi-(0.5*id1*(id1-1));
			
 
				-
			
 
				-		for (int frame=0;frame<nconf;frame++){
			
 
				-			dx=d_x[frame*numatm+id1]-d_x[frame*numatm+id2];
			
 
				-			dy=d_y[frame*numatm+id1]-d_y[frame*numatm+id2];
			
 
				-			dz=d_z[frame*numatm+id1]-d_z[frame*numatm+id2];
			
 
				-
			
 
				-			dx=dx-xbox*(round(dx/xbox));
			
 
				-			dy=dy-ybox*(round(dy/ybox));
			
 
				-			dz=dz-zbox*(round(dz/zbox));
			
 
				-
			
 
				-			r=sqrtf(dx*dx+dy*dy+dz*dz);
			
 
				-			if (r<cut) {
			
 
				-				ig2=(int)(r/del);
			
 
				-				atomicAdd(&d_g2[ig2],2) ;
			
 
				-			}
			
 
				-		}
			
 
				-	}
			
 
				+    double r, cut, dx, dy, dz;
			
 
				+    int ig2;
			
 
				+    double box;
			
 
				+    box = min(xbox, ybox);
			
 
				+    box = min(box, zbox);
			
 
				+
			
 
				+    double del = box / (2.0 * d_bin);
			
 
				+    cut = box * 0.5;
			
 
				+
			
 
				+    int id1 = blockIdx.y * blockDim.y + threadIdx.y;
			
 
				+    int id2 = blockIdx.x * blockDim.x + threadIdx.x;
			
 
				+
			
 
				+    if (id1 >= numatm || id2 >= numatm) return;
			
 
				+    if (id1 > id2) return;
			
 
				+
			
 
				+    for (int frame = 0; frame < nconf; ++frame) {
			
 
				+        dx = d_x[frame * numatm + id1] - d_x[frame * numatm + id2];
			
 
				+        dy = d_y[frame * numatm + id1] - d_y[frame * numatm + id2];
			
 
				+        dz = d_z[frame * numatm + id1] - d_z[frame * numatm + id2];
			
 
				+
			
 
				+        dx = dx - xbox * (round(dx / xbox));
			
 
				+        dy = dy - ybox * (round(dy / ybox));
			
 
				+        dz = dz - zbox * (round(dz / zbox));
			
 
				+
			
 
				+        r = sqrtf(dx * dx + dy * dy + dz * dz);
			
 
				+        if (r < cut) {
			
 
				+            ig2 = (int)(r / del);
			
 
				+            atomicAdd(&d_g2[ig2], 2);
			
 
				+        }
			
 
				+    }
			
 
				 }
			
 
				-
			
 
				-
			
--- a/hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/openmp/nways_openmp.ipynb
+++ b/hpc/nways/nways_labs/nways_MD/English/Fortran/jupyter_notebook/openmp/nways_openmp.ipynb
@@ -174,7 +174,7 @@
 
				     "### ```teams``` directive\n",
			
 
				     "```teams``` directve creates a league of thread teams where the master thread of each team executes the region. Each of these master threads executes sequentially. Or in other words teams directive spawn 1 or more thread teams with the same number of threads. The execution continues on the master threads of each team (redundantly). There is no synchronization allowed between teams. \n",
			
 
				     "\n",
			
 
				-    "OpenMP calls that somewhere a gang, which might be a thread on the CPU or maying a CUDA threadblock or OpenCL workgroup. It will choose how many teams to create based on where you're running, only a few on a CPU (like 1 per CPU core) or lots on a GPU (1000's possibly). ```teams``` allow OpenMP code to scale from small CPUs to large GPUs because each one works completely independently of each other ```teams```.\n",
			
 
				+    "OpenMP calls that somewhere a team, which might be a thread on the CPU or maying a CUDA threadblock or OpenCL workgroup. It will choose how many teams to create based on where you're running, only a few on a CPU (like 1 per CPU core) or lots on a GPU (1000's possibly). ```teams``` allow OpenMP code to scale from small CPUs to large GPUs because each one works completely independently of each other ```teams```.\n",
			
 
				     "\n",
			
 
				     "<img src=\"../images/openmp_target_teams.png\">\n",
			
 
				     "\n",
			
--- a/hpc_ai/ai_science_cfd/Dockerfile
+++ b/hpc_ai/ai_science_cfd/Dockerfile
@@ -5,7 +5,7 @@
 
				 # Finally, open http://127.0.0.1:8888/
			
 
				 
			
 
				 # Select Base Image 
			
 
				-FROM nvcr.io/nvidia/tensorflow:20.01-tf2-py3
			
 
				+FROM nvcr.io/nvidia/tensorflow:21.05-tf2-py3
			
 
				 # Update the repo
			
 
				 RUN apt-get update
			
 
				 # Install required dependencies
			
--- a/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/CFD/Part3.ipynb
+++ b/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/CFD/Part3.ipynb
@@ -493,8 +493,7 @@
 
				    "source": [
			
 
				     "#But Training our model from scratch will take a long time\n",
			
 
				     "#So we will load a partially trained model to speedup the process \n",
			
 
				-    "K.clear_session()\n",
			
 
				-    "conv_model = tf.keras.models.load_model(\"conv_model.h5\",custom_objects={'loss_image': loss_image})\n",
			
 
				+    "conv_model.load_weights(\"conv_model.h5\")\n",
			
 
				     "\n",
			
 
				     "history = conv_model.fit(training_dataset, epochs=5, steps_per_epoch=train_batches,\n",
			
 
				     "          validation_data=validation_dataset, validation_steps=validation_batches, \n",
			
@@ -726,8 +725,7 @@
 
				    "source": [
			
 
				     "#But Training our model from scratch will take a long time\n",
			
 
				     "#So we will load a partially trained model to speedup the process \n",
			
 
				-    "K.clear_session()\n",
			
 
				-    "conv_sdf_model = tf.keras.models.load_model(\"conv_sdf_model.h5\",custom_objects={'loss_image': loss_image})\n",
			
 
				+    "conv_sdf_model.load_weights(\"conv_sdf_model.h5\")\n",
			
 
				     "\n",
			
 
				     "history = conv_sdf_model.fit(sdf_training_dataset, epochs=5, steps_per_epoch=train_batches,\n",
			
 
				     "          validation_data=sdf_validation_dataset, validation_steps=validation_batches)\n",
			
--- a/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Intro_to_DL/CNN's.ipynb
+++ b/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Intro_to_DL/CNN's.ipynb
--- a/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Intro_to_DL/Part_2.ipynb
+++ b/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Intro_to_DL/Part_2.ipynb
@@ -25,21 +25,21 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "\n",
			
 
				-    "In this notebook, participants will be introduced to CNN, implement it using Keras. For an absolute beginner this notebook would serve as a good starting point.\n",
			
 
				+    "In this notebook you will be introduced to the concept of a convolutional neural network (CNN) and implement one using Keras. This notebook is designed as a starting point for absolute beginners to deep learning.\n",
			
 
				     "\n",
			
 
				-    "**Contents of the this notebook:**\n",
			
 
				+    "**Contents of this notebook:**\n",
			
 
				     "\n",
			
 
				-    "- [How a Deep Learning project is planned ?](#Machine-Learning-Pipeline)\n",
			
 
				-    "- [Wrapping things up with an example ( Classification )](#Wrapping-Things-up-with-an-Example)\n",
			
 
				-    "     - [Fully Connected Networks](#Image-Classification-on-types-of-Clothes)\n",
			
 
				+    "- [How a deep learning project is planned](#Machine-Learning-Pipeline)\n",
			
 
				+    "- [Wrapping things up with an example (classification)](#Wrapping-Things-up-with-an-Example)\n",
			
 
				+    "     - [Fully connected networks](#Image-Classification-on-types-of-Clothes)\n",
			
 
				     "\n",
			
 
				     "\n",
			
 
				-    "**By the end of this notebook participant will:**\n",
			
 
				+    "**By the end of this notebook the participant will:**\n",
			
 
				     "\n",
			
 
				-    "- Understand the Machine Learning Pipeline\n",
			
 
				-    "- Write a Deep Learning Classifier and train it.\n",
			
 
				+    "- Understand machine learning pipelines\n",
			
 
				+    "- Write a deep learning classifier and train it\n",
			
 
				     "\n",
			
 
				-    "**We will be building a _Multi-class Classifier_ to classify images of clothing to their respective classes**"
			
 
				+    "**We will be building a _multi-class classifier_ to classify images of clothing into their respective classes.**"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -48,29 +48,29 @@
 
				    "source": [
			
 
				     "## Machine Learning Pipeline\n",
			
 
				     "\n",
			
 
				-    "During the bootcamp we will be making use of the following buckets to help us understand how a Machine Learning project should be planned and executed: \n",
			
 
				+    "During the bootcamp we will be making use of the following concepts to help us understand how a machine learning (ML) project should be planned and executed:\n",
			
 
				     "\n",
			
 
				     "1. **Data**: To start any ML project we need data which is pre-processed and can be fed into the network.\n",
			
 
				-    "2. **Task**: There are many tasks present in ML, we need to make sure we understand and define the problem statement accurately.\n",
			
 
				-    "3. **Model**: We need to build our model, which is neither too deep and taking a lot of computational power or too small that it could not learn the important features.\n",
			
 
				-    "4. **Loss**: Out of the many _loss functions_ present, we need to carefully choose a _loss function_ which is suitable for the task we are about to carry out.\n",
			
 
				-    "5. **Learning**: As we mentioned in our last notebook, there are a variety of _optimisers_ each with their advantages and disadvantages. So here we choose an _optimiser_ which is suitable for our task and train our model using the set hyperparameters.\n",
			
 
				-    "6. **Evaluation**: This is a crucial step in the process to determine if our model has learnt the features properly by analysing how it performs when unseen data is given to it. "
			
 
				+    "2. **Task**: There are many possible tasks in the field of ML; we need to make sure we understand and define the problem statement accurately.\n",
			
 
				+    "3. **Model**: We need to build our model, which is neither too deep (requiring a lot of computational power) nor too small (preventing it from learning the important features).\n",
			
 
				+    "4. **Loss**: Out of the many _loss functions_ that can be defined, we need to carefully choose one which is suitable for the task we are about to carry out.\n",
			
 
				+    "5. **Learning**: There are a variety of _optimisers_, each with their advantages and disadvantages. We must choose one which is suitable for our task and train our model using some suitably chosen hyperparameters.\n",
			
 
				+    "6. **Evaluation**: We must determine if our model has learned the features properly by analysing how it performs on data it has not previously seen. "
			
 
				    ]
			
 
				   },
			
 
				   {
			
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "**Here we will be building a _Multi-class Classifier_ to classify images of clothing to their respective classes.**\n",
			
 
				+    "**Here we will be building a _multi-class classifier_ to classify images of clothing into their respective classes.**\n",
			
 
				     "\n",
			
 
				-    "We will follow the above discussed pipeline to complete the example.\n",
			
 
				+    "We will follow the pipeline presented above to complete the example.\n",
			
 
				     "\n",
			
 
				-    "## Image Classification on types of clothes  \n",
			
 
				+    "## Image classification on types of clothes  \n",
			
 
				     "\n",
			
 
				-    "####  Step -1 : Data \n",
			
 
				+    "####  Step 1: Data \n",
			
 
				     "\n",
			
 
				-    "We will be using the **F-MNIST ( Fashion MNIST )** dataset, which is a very popular dataset. This dataset contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels).\n",
			
 
				+    "We will be using the **Fashion MNIST** dataset, which is a very popular introductory dataset in deep learning. This dataset contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels).\n",
			
 
				     "\n",
			
 
				     "<img src=\"images/fashion-mnist.png\" alt=\"Fashion MNIST sprite\"  width=\"600\">\n",
			
 
				     "\n",
			
@@ -196,27 +196,27 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "#Print Array Size of Training Set \n",
			
 
				-    "print(\"Size of Training Images :\"+str(train_images.shape))\n",
			
 
				-    "#Print Array Size of Label\n",
			
 
				-    "print(\"Size of Training Labels :\"+str(train_labels.shape))\n",
			
 
				-    "\n",
			
 
				-    "#Print Array Size of Test Set \n",
			
 
				-    "print(\"Size of Test Images :\"+str(test_images.shape))\n",
			
 
				-    "#Print Array Size of Label\n",
			
 
				-    "print(\"Size of Test Labels :\"+str(test_labels.shape))\n",
			
 
				-    "\n",
			
 
				-    "#Let's See how our Outputs Look like \n",
			
 
				-    "print(\"Training Set Labels :\"+str(train_labels))\n",
			
 
				-    "#Data in the Test Set\n",
			
 
				-    "print(\"Test Set Labels :\"+str(test_labels))"
			
 
				+    "# Print array size of training dataset\n",
			
 
				+    "print(\"Size of Training Images: \" + str(train_images.shape))\n",
			
 
				+    "# Print array size of labels\n",
			
 
				+    "print(\"Size of Training Labels: \" + str(train_labels.shape))\n",
			
 
				+    "\n",
			
 
				+    "# Print array size of test dataset\n",
			
 
				+    "print(\"Size of Test Images: \" + str(test_images.shape))\n",
			
 
				+    "# Print array size of labels\n",
			
 
				+    "print(\"Size of Test Labels: \" + str(test_labels.shape))\n",
			
 
				+    "\n",
			
 
				+    "# Let's see how our outputs look\n",
			
 
				+    "print(\"Training Set Labels: \" + str(train_labels))\n",
			
 
				+    "# Data in the test dataset\n",
			
 
				+    "print(\"Test Set Labels: \" + str(test_labels))"
			
 
				    ]
			
 
				   },
			
 
				   {
			
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Data Pre-processing\n"
			
 
				+    "## Data Preprocessing\n"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -236,7 +236,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "The image pixel values range from 0 to 255. Let us now normalise the data range from 0 - 255 to 0 - 1 in both the *Train* and *Test* set. This Normalisation of pixels helps us by optimizing the process where the gradients are computed."
			
 
				+    "The image pixel values range from 0 to 255. Let us now normalise the data range from 0 - 255 to 0 - 1 in both the *Train* and *Test* set. This normalisation of pixels helps us by optimizing the process where the gradients are computed."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -255,7 +255,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "# Let's Print to Veryify if the Data is of the correct format.\n",
			
 
				+    "# Let's print to verify whether the data is of the correct format.\n",
			
 
				     "plt.figure(figsize=(10,10))\n",
			
 
				     "for i in range(25):\n",
			
 
				     "    plt.subplot(5,5,i+1)\n",
			
@@ -273,13 +273,13 @@
 
				    "source": [
			
 
				     "## Defining our Model\n",
			
 
				     "\n",
			
 
				-    "Our Model has three layers :\n",
			
 
				+    "Our model has three layers:\n",
			
 
				     "\n",
			
 
				-    "- 784 Input features ( 28 * 28 ) \n",
			
 
				-    "- 128 nodes in hidden layer (Feel free to experiment with the value)\n",
			
 
				-    "- 10 output nodes to denote the Class\n",
			
 
				+    "- 784 input features (28 * 28)\n",
			
 
				+    "- 128 nodes in the hidden layer (feel free to experiment with this value)\n",
			
 
				+    "- 10 output nodes to denote the class\n",
			
 
				     "\n",
			
 
				-    "Implementing the same in Keras ( Machine Learning framework built on top of Tensorflow, Theano, etc..) \n"
			
 
				+    "We will implement this model in Keras (TensorFlow's high-level API for machine learning).\n"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -346,7 +346,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "model.fit(train_images, train_labels ,epochs=5)"
			
 
				+    "model.fit(train_images, train_labels, epochs=5)"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -364,7 +364,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "#Evaluating the Model using the Test Set\n",
			
 
				+    "# Evaluating the model using the test dataset\n",
			
 
				     "\n",
			
 
				     "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
			
 
				     "\n",
			
@@ -375,14 +375,14 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "We get an Accuracy of 87% in the Test dataset which is less than the 89% we got during the Training phase, This problem in ML is called as Overfitting, and we have discussed the same in the previous notebook. \n",
			
 
				+    "We get an accuracy of 87% on the test dataset, which is less than the 89% we got during the training phase. This problem in machine learning is called overfitting.\n",
			
 
				     "\n",
			
 
				     "## Exercise\n",
			
 
				     "\n",
			
 
				-    "Try adding more dense layers to the network above and observe change in accuracy.\n",
			
 
				+    "Try adding more dense layers to the network above and observe the change in accuracy.\n",
			
 
				     "\n",
			
 
				     "## Important:\n",
			
 
				-    "<mark>Shutdown the kernel before clicking on “Next Notebook” to free up the GPU memory.</mark>\n",
			
 
				+    "<mark>Shut down the kernel before clicking on “Next Notebook” to free up the GPU memory.</mark>\n",
			
 
				     "\n",
			
 
				     "\n",
			
 
				     "## Licensing\n",
			
--- a/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Intro_to_DL/Resnets.ipynb
+++ b/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Intro_to_DL/Resnets.ipynb
--- a/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Start_Here.ipynb
+++ b/hpc_ai/ai_science_cfd/English/python/jupyter_notebook/Start_Here.ipynb
@@ -6,15 +6,15 @@
 
				    "source": [
			
 
				     "# Welcome to AI for Science Bootcamp\n",
			
 
				     "\n",
			
 
				-    "The objective of this bootcamp is to give an introduction to application of Artificial Intelligence (AI) algorithms in Science ( High Performance Computing(HPC) Simulations ). This bootcamp will introduce participants to fundamentals of AI and how those can be applied to different HPC simulation domains. \n",
			
 
				+    "The objective of this bootcamp is to provide an introduction to applications of artificial intelligence (AI) algorithms in scientific high performance computing. This bootcamp will introduce participants to fundamentals of AI and how AI can be applied to HPC simulation domains.\n",
			
 
				     "\n",
			
 
				-    "The following contents will be covered during the Bootcamp :\n",
			
 
				+    "The following contents will be covered during the bootcamp:\n",
			
 
				     "- [CNN Primer and Keras 101](Intro_to_DL/Part_2.ipynb)\n",
			
 
				     "- [Steady State Flow using Neural Networks](CFD/Start_Here.ipynb)\n",
			
 
				     "\n",
			
 
				     "## Quick GPU Check\n",
			
 
				     "\n",
			
 
				-    "Before moving forward let us check if Tensorflow backend is able to see and use GPU"
			
 
				+    "Before moving forward let us verify that TensorFlow is able to see and use your GPU."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -45,7 +45,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "The output of the cell above should show all the available compaitable GPU on the system. If no GPU device is listed or you see an error means that there was no compaitable GPU present on the system and the future calls may run on CPU consuming more time."
			
 
				+    "The output of the cell above should show an available compatible GPU on the system (if there are multiple GPUs, only device 0 will be shown). If no GPU device is listed or you see an error, it means that there was no compatible GPU present on the system, and the future calls may run on the CPU, consuming more time."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -54,15 +54,15 @@
 
				    "source": [
			
 
				     "## [CNN Primer and Keras 101](Intro_to_DL/Part_2.ipynb)\n",
			
 
				     "\n",
			
 
				-    "In this notebook, participants will be introduced to Convolution Neural Network and how to implement one using Keras API. For an absolute beginner to CNN and Keras this notebook would serve as a good starting point.\n",
			
 
				+    "In this notebook, participants will be introduced to convolutional neural networks (CNNs) and how to implement one using the Keras API in TensorFlow. This notebook would serve as a good starting point for absolute beginners to neural networks.\n",
			
 
				     "\n",
			
 
				     "**By the end of this notebook you will:**\n",
			
 
				     "\n",
			
 
				-    "- Understand the Machine Learning pipeline\n",
			
 
				-    "- Understand how a Convolution Neural Network works\n",
			
 
				-    "- Write your own Deep Learning classifier and train it.\n",
			
 
				+    "- Understand machine learning pipelines\n",
			
 
				+    "- Understand how a convolutional neural network works\n",
			
 
				+    "- Write your own deep learning classifier and train it\n",
			
 
				     "\n",
			
 
				-    "For in depth understanding of Deep Learning Concepts, visit [NVIDIA Deep Learning Institute](https://www.nvidia.com/en-us/deep-learning-ai/education/)"
			
 
				+    "For an in-depth understanding of deep learning concepts, visit the [NVIDIA Deep Learning Institute](https://www.nvidia.com/en-us/deep-learning-ai/education/)."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -71,24 +71,24 @@
 
				    "source": [
			
 
				     "## [Steady State Flow using Neural Networks](CFD/Start_Here.ipynb)\n",
			
 
				     "\n",
			
 
				-    "In this notebook, participant will be introduced to how Deep Learning can be applied in the field of Fluid Dynamics.\n",
			
 
				+    "In this notebook, participants will be introduced to how deep learning can be applied in the field of fluid dynamics.\n",
			
 
				     "\n",
			
 
				     "**Contents of this notebook:**\n",
			
 
				     "\n",
			
 
				     "- Understanding the problem statement\n",
			
 
				-    "- Building a Deep Learning Pipeline\n",
			
 
				-    "    - Understand data and task\n",
			
 
				+    "- Building a deep learning pipeline\n",
			
 
				+    "    - Understand the data and the task\n",
			
 
				     "    - Discuss various models\n",
			
 
				-    "    - Define Neural network parameters\n",
			
 
				-    "- Fully Connected Networks\n",
			
 
				+    "    - Define neural network parameters\n",
			
 
				+    "- Fully connected networks\n",
			
 
				     "- Convolutional models\n",
			
 
				     "- Advanced networks\n",
			
 
				     "\n",
			
 
				-    "**By the end of the notebook participant will:** \n",
			
 
				+    "**By the end of the notebook the participant will:** \n",
			
 
				     "\n",
			
 
				-    "- Understand the process of applying Deep Learning to Computational Fluid Dynamics\n",
			
 
				-    "- Understanding how Residual Blocks work.\n",
			
 
				-    "- Benchmark between different models and how they compare against one another.\n",
			
 
				+    "- Understand the process of applying deep learning to computational fluid dynamics\n",
			
 
				+    "- Understand how residual blocks work\n",
			
 
				+    "- Benchmark between different models and how they compare against one another\n",
			
 
				     "\n",
			
 
				     "## Licensing\n",
			
 
				     "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)"
			
--- a/hpc_ai/ai_science_cfd/English/python/source_code/dataset.py
+++ b/hpc_ai/ai_science_cfd/English/python/source_code/dataset.py
@@ -27,27 +27,27 @@
 
				 import gdown
			
 
				 import os
			
 
				 ## CFD TRAIN DATASET
			
 
				-url = 'https://drive.google.com/uc?id=0BzsbU65NgrSuZDBMOW93OWpsMHM&export=download'
			
 
				+url = 'https://drive.google.com/uc?id=1VZOPUG6mHsRYG58H_l3_LOPM4N4f9LiZ&export=download'
			
 
				 output = '/workspace/python/jupyter_notebook/CFD/data/train.tfrecords'
			
 
				 gdown.download(url, output, quiet=False,proxy=None)
			
 
				 
			
 
				 ## CFD TEST DATASET
			
 
				-url = 'https://drive.google.com/uc?id=1WSJLK0cOQehixJ6Tf5k0eYDcb4RJ5mXv&export=download'
			
 
				+url = 'https://drive.google.com/uc?id=1fTo0L0ckqGEeZjLwefBc4S5e28psixle&export=download'
			
 
				 output = '/workspace/python/jupyter_notebook/CFD/data/test.tfrecords'
			
 
				 gdown.download(url, output, quiet=False,proxy=None)
			
 
				 
			
 
				 ## CFD CONV_SDF MODEL
			
 
				-url = 'https://drive.google.com/uc?id=1pfR0io1CZKvXArGk-nt2wciUoAN_6Z08&export=download'
			
 
				+url = 'https://drive.google.com/uc?id=1ObX4jjhv2wkaTfI-ai09SyoOqVP20jAU&export=download'
			
 
				 output = '/workspace/python/jupyter_notebook/CFD/conv_sdf_model.h5'
			
 
				 gdown.download(url, output, quiet=False,proxy=None)
			
 
				 
			
 
				 ## CFD CONV MODEL
			
 
				-url = 'https://drive.google.com/uc?id=1rFhqlQnTkzIyZocjAxMffucmS3FDI0_j&export=download'
			
 
				+url = 'https://drive.google.com/uc?id=1xfw9C7PFrd3e_ef92ZZbRuK__ak7mo0f&export=download'
			
 
				 output = '/workspace/python/jupyter_notebook/CFD/conv_model.h5'
			
 
				 gdown.download(url, output, quiet=False,proxy=None)
			
 
				 
			
 
				 
			
 
				 ## CFD TEST Dataset
			
 
				-url = 'https://drive.google.com/uc?id=0BzsbU65NgrSuR2NRRjBRMDVHaDQ&export=download'
			
 
				+url = 'https://drive.google.com/uc?id=1VG9jCTBcERytV7w5bHoaVIZSQOa-AlmU&export=download'
			
 
				 output = '/workspace/python/jupyter_notebook/CFD/data/computed_car_flow.zip'
			
 
				 gdown.cached_download(url, output, quiet=False,proxy=None,postprocess=gdown.extractall)
			
--- a/hpc_ai/ai_science_cfd/Singularity
+++ b/hpc_ai/ai_science_cfd/Singularity
@@ -1,7 +1,7 @@
 
				 # Copyright (c) 2020 NVIDIA Corporation.  All rights reserved. 
			
 
				 
			
 
				 Bootstrap: docker
			
 
				-FROM: nvcr.io/nvidia/tensorflow:20.01-tf2-py3
			
 
				+FROM: nvcr.io/nvidia/tensorflow:21.05-tf2-py3
			
 
				 
			
 
				 %environment
			
 
				 %post
			
--- a/hpc_ai/ai_science_climate/Dockerfile
+++ b/hpc_ai/ai_science_climate/Dockerfile
@@ -5,7 +5,7 @@
 
				 # Finally, open http://127.0.0.1:8888/
			
 
				 
			
 
				 # Select Base Image 
			
 
				-FROM nvcr.io/nvidia/tensorflow:20.01-tf2-py3
			
 
				+FROM nvcr.io/nvidia/tensorflow:21.05-tf2-py3
			
 
				 # Update the repo
			
 
				 RUN apt-get update -y
			
 
				 # Install required dependencies
			
--- a/hpc_ai/ai_science_climate/English/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/Competition.ipynb
+++ b/hpc_ai/ai_science_climate/English/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/Competition.ipynb
@@ -201,7 +201,7 @@
 
				    "outputs": [],
			
 
				    "source": [
			
 
				     "import numpy as np\n",
			
 
				-    "np.random.seed(1337)\n",
			
 
				+    "tf.random.set_seed(1337)\n",
			
 
				     "\n",
			
 
				     "import tensorflow.keras\n",
			
 
				     "from tensorflow.keras.models import Sequential\n",
			
--- a/hpc_ai/ai_science_climate/English/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/Countering_Data_Imbalance.ipynb
+++ b/hpc_ai/ai_science_climate/English/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/Countering_Data_Imbalance.ipynb
@@ -261,7 +261,7 @@
 
				    "outputs": [],
			
 
				    "source": [
			
 
				     "import numpy as np\n",
			
 
				-    "np.random.seed(1337)\n",
			
 
				+    "tf.random.set_seed(1337)\n",
			
 
				     "\n",
			
 
				     "import tensorflow.keras\n",
			
 
				     "from tensorflow.keras.models import Sequential\n",
			
@@ -313,8 +313,7 @@
 
				     "\n",
			
 
				     "#But Training our model from scratch will take a long time\n",
			
 
				     "#So we will load a partially trained model to speedup the process \n",
			
 
				-    "K.clear_session()\n",
			
 
				-    "model = tf.keras.models.load_model(\"trained_16.h5\",custom_objects={'top2_acc': top2_acc})\n",
			
 
				+    "model.load_weights(\"trained_16.h5\")\n",
			
 
				     "\n",
			
 
				     "# Optimizer\n",
			
 
				     "sgd = tensorflow.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9)\n",
			
--- a/hpc_ai/ai_science_climate/English/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/Manipulation_of_Image_Data_and_Category_Determination_using_Text_Data.ipynb
+++ b/hpc_ai/ai_science_climate/English/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/Manipulation_of_Image_Data_and_Category_Determination_using_Text_Data.ipynb
@@ -472,7 +472,7 @@
 
				     "import os\n",
			
 
				     "\n",
			
 
				     "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"\n",
			
 
				-    "np.random.seed(1337)\n",
			
 
				+    "tf.random.set_seed(1337)\n",
			
 
				     "\n",
			
 
				     "import tensorflow.keras\n",
			
 
				     "from tensorflow.keras.models import Sequential\n",
			
@@ -541,8 +541,7 @@
 
				     "\n",
			
 
				     "#But Training our model from scratch will take a long time\n",
			
 
				     "#So we will load a partially trained model to speedup the process \n",
			
 
				-    "K.clear_session()\n",
			
 
				-    "model = tf.keras.models.load_model(\"trained_16.h5\",custom_objects={'top2_acc': top2_acc})\n",
			
 
				+    "model.load_weights(\"trained_16.h5\")\n",
			
 
				     "\n",
			
 
				     "# Optimizer\n",
			
 
				     "sgd = tensorflow.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9)\n",
			
--- a/hpc_ai/ai_science_climate/English/python/source_code/dataset.py
+++ b/hpc_ai/ai_science_climate/English/python/source_code/dataset.py
@@ -28,11 +28,11 @@ import gdown
 
				 import os
			
 
				 
			
 
				 ## TC TL MODEL  
			
 
				-url = 'https://drive.google.com/uc?id=1crZ7dHhMRuE_N-8NAzF7Nrb6AVfpxThj&export=download'
			
 
				+url = 'https://drive.google.com/uc?id=1Rb9gKSDdLC8y8yMcDqeOHAGj9qA0mDLJ&export=download'
			
 
				 output = '/workspace/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/trained_16.h5'
			
 
				 gdown.download(url, output, quiet=False,proxy=None)
			
 
				 ## TC Dataset  
			
 
				-url = 'https://drive.google.com/uc?id=1x0vNpYMa4UM95svCL_Cxty8rzjQmG0Rz&export=download'
			
 
				+url = 'https://drive.google.com/uc?id=1vMXpbWx_-DO8CNkG68eErzcREvfouT5d&export=download'
			
 
				 output = '/workspace/python/jupyter_notebook/Tropical_Cyclone_Intensity_Estimation/dataset.zip'
			
 
				 gdown.cached_download(url, output, quiet=False,proxy=None,postprocess=gdown.extractall)
			
 
				 
			
--- a/hpc_ai/ai_science_climate/Singularity
+++ b/hpc_ai/ai_science_climate/Singularity
@@ -1,7 +1,7 @@
 
				 # Copyright (c) 2020 NVIDIA Corporation.  All rights reserved.
			
 
				 
			
 
				 Bootstrap: docker
			
 
				-FROM: nvcr.io/nvidia/tensorflow:20.01-tf2-py3
			
 
				+FROM: nvcr.io/nvidia/tensorflow:21.05-tf2-py3
			
 
				 
			
 
				 %environment
			
 
				 %post