Tim M 57b5cefde5 Updated all `.md` files to contain newest image 2 anos atrás
..
README.md 57b5cefde5 Updated all `.md` files to contain newest image 2 anos atrás
cuda_matmul.cu 4966d51985 added code for gpu arch (part 1) blog post 2 anos atrás
profiler_demo_utils.py 5e428a94f2 added code for part 2 2 anos atrás
pytorch_profiler_demo.py 5e428a94f2 added code for part 2 2 anos atrás

README.md

Demystifying GPU architectures for deep learning

This folder contains code for the following blog posts.

  1. Demystifying GPU architectures for deep learning part 1
  2. Demystifying GPU architectures for deep learning part 2

Demystifying GPU architectures for deep learning

Download Code

Environment

All code was tested on a PC with RTX 3090 and AMD Ryzen 5800X.

Kernel version:

sf@trantor:~/Downloads$ uname -r
5.4.0-121-generic

How to use

Compile and run

nvcc cuda_matmul.cu -lm -o cu_mm.out
./cu_mm.out 2048 256 512

Results

On the tested system, the GPU was about 650 times faster than the CPU.

AI Courses by OpenCV

Want to become an expert in AI? AI Courses by OpenCV is a great place to start.