Tim M 57b5cefde5 Updated all `.md` files to contain newest image пре 2 година
..
README.md 57b5cefde5 Updated all `.md` files to contain newest image пре 2 година
cuda_matmul.cu 4966d51985 added code for gpu arch (part 1) blog post пре 2 година
profiler_demo_utils.py 5e428a94f2 added code for part 2 пре 2 година
pytorch_profiler_demo.py 5e428a94f2 added code for part 2 пре 2 година

README.md

Demystifying GPU architectures for deep learning

This folder contains code for the following blog posts.

  1. Demystifying GPU architectures for deep learning part 1
  2. Demystifying GPU architectures for deep learning part 2

Demystifying GPU architectures for deep learning

Download Code

Environment

All code was tested on a PC with RTX 3090 and AMD Ryzen 5800X.

Kernel version:

sf@trantor:~/Downloads$ uname -r
5.4.0-121-generic

How to use

Compile and run

nvcc cuda_matmul.cu -lm -o cu_mm.out
./cu_mm.out 2048 256 512

Results

On the tested system, the GPU was about 650 times faster than the CPU.

AI Courses by OpenCV

Want to become an expert in AI? AI Courses by OpenCV is a great place to start.