Abstract:Matrix multiplication is a basic operation in scientific computing. Efficient implementation of matrix multiplication can speed up many applications. In this paper, we implement an efficient matrix multiplication on GPU using NVIDIA’s CUDA. The experiment shows that our implementation is as fast as the implementation in CUBLAS, and the speed of our implementation can reach the peak speed’s 97%, on Geforce GTX260.