Beating cuBLAS in Single-Precision General Matrix Multiplication

salykova.github.io Beating cuBLAS in Single-Precision General Matrix Multiplication

In this blog post, we’ll walk through an implementation of the SGEMM (Single-precision GEneral Matrix Multiply) operation defined as C := alpha*A*B + beta*C. We will review three different kernels, each optimized for specific matrix size problems. Our final implementation is optimized for Ampere arc...

Comments

0 comments