Beating cuBLAS in Single-Precision General Matrix Multiplication
Beating cuBLAS in Single-Precision General Matrix Multiplication
salykova.github.io Beating cuBLAS in Single-Precision General Matrix Multiplication
In this blog post, we’ll walk through an implementation of the SGEMM (Single-precision GEneral Matrix Multiply) operation defined as C := alpha*A*B + beta*C. We will review three different kernels, each optimized for specific matrix size problems. Our final implementation is optimized for Ampere arc...
0 comments