Inside the Matrix: How does matrix multiplication work inside GPUs?

Name: Inside the Matrix: How does matrix multiplication work inside GPUs?
Uploaded: 2023-06-11T07:47:08+00:00
Channel: DeepLearning Hero
Description: In this video, we dive into the mechanics of a GPU and learn how they perform matrix multiplication; the core computation powering deep neural networks ...

DeepLearning Hero · Beginner ·🧠 Large Language Models ·2y ago

In this video, we dive into the mechanics of a GPU and learn how they perform matrix multiplication; the core computation powering deep neural networks and large language models. By the end of the video you'll learn, an efficient formulation of matrix multiplication, computing matrix multiplication with tiling and kernel fusion. GEMM basics: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html CUDA linear algebra: https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/ A100 specifications: https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/ 00:00 - Introduction 02:40 - GEMM basics 03:24 - Naive implementation of matmul 04:19 - GPU memory hierarchy 05:34 - Memory thrashing of GPUs 06:00 - Memory efficient implementation of matmul 06:33 - Matmul with tiling 08:17 - GPU execution hierarchy 09:25 - Magic of power of 2 10:15 - Tile quantization 11:14 - Kernel fusion 12:24 - Conclusion

Watch on YouTube ↗ (saves to browser)