Tim M 57b5cefde5 Updated all `.md` files to contain newest image 2 سال پیش
..
README.md 57b5cefde5 Updated all `.md` files to contain newest image 2 سال پیش
cuda_matmul.cu 4966d51985 added code for gpu arch (part 1) blog post 2 سال پیش
profiler_demo_utils.py 5e428a94f2 added code for part 2 2 سال پیش
pytorch_profiler_demo.py 5e428a94f2 added code for part 2 2 سال پیش

README.md

Demystifying GPU architectures for deep learning

This folder contains code for the following blog posts.

  1. Demystifying GPU architectures for deep learning part 1
  2. Demystifying GPU architectures for deep learning part 2

Demystifying GPU architectures for deep learning

Download Code

Environment

All code was tested on a PC with RTX 3090 and AMD Ryzen 5800X.

Kernel version:

sf@trantor:~/Downloads$ uname -r
5.4.0-121-generic

How to use

Compile and run

nvcc cuda_matmul.cu -lm -o cu_mm.out
./cu_mm.out 2048 256 512

Results

On the tested system, the GPU was about 650 times faster than the CPU.

AI Courses by OpenCV

Want to become an expert in AI? AI Courses by OpenCV is a great place to start.