Tim M 57b5cefde5 Updated all `.md` files to contain newest image 2 年之前
..
README.md 57b5cefde5 Updated all `.md` files to contain newest image 2 年之前
cuda_matmul.cu 4966d51985 added code for gpu arch (part 1) blog post 2 年之前
profiler_demo_utils.py 5e428a94f2 added code for part 2 2 年之前
pytorch_profiler_demo.py 5e428a94f2 added code for part 2 2 年之前

README.md

Demystifying GPU architectures for deep learning

This folder contains code for the following blog posts.

  1. Demystifying GPU architectures for deep learning part 1
  2. Demystifying GPU architectures for deep learning part 2

Demystifying GPU architectures for deep learning

Download Code

Environment

All code was tested on a PC with RTX 3090 and AMD Ryzen 5800X.

Kernel version:

sf@trantor:~/Downloads$ uname -r
5.4.0-121-generic

How to use

Compile and run

nvcc cuda_matmul.cu -lm -o cu_mm.out
./cu_mm.out 2048 256 512

Results

On the tested system, the GPU was about 650 times faster than the CPU.

AI Courses by OpenCV

Want to become an expert in AI? AI Courses by OpenCV is a great place to start.