Accelerating Matrix Power Operations with GPUs

Lezhou Wu

doi:10.1145/3462676.3462680

Abstract

Many-core framework, represented by Graphic Processing Unit (GPU), has been becoming more popular in scientific computing and data analysis, and Compute Unified Device Architecture (CUDA) allows developers to carry out computation on GPUs more conveniently. Matrix operation is one of the most important application scenarios of GPU computation and CUDA, widely appearing in domains like graph applications and artificial intelligence. This paper focuses on accelerating matrix power operation (MPO) process, which appears in many engineering problems and may be time-consuming when the scale of matrixes is large, with GPUs and CUDA. In this paper, three techniques are used to speed up the MPO process: matrix parallel reduction, matrix parallel multiplication and CUDA dynamic parallelism. The experiments were carried out on NVIDIA RTX 2080 Super and Intel(R) Core(TM) i7-10750H. The code is compiled and executed in Visual Studio 2019 community, the GPU driver version is 461.40, the CUDA version is 11.1, and the NVCC version is 11.1.74. With all three techniques, the MPO process can get hundreds of times speedup compared with the sequential multiplication process executed by Eigen library, even though the code of algorithm is hardly optimized. Moreover, with the number of power increasing, the efficiency of algorithm can be even higher because of the relatively fixed time consumed in data transmission and GPU memory allocation.

Full Text