CPU versus GPU: which can perform matrix computation faster—performance comparison for basic linear algebra subprograms

Feng Li,Xiaofeng Zhang,Zhaoyang Tian,Yunming Ye

doi:10.1007/s00521-018-3354-z

Abstract

Matrix computing is the core component of machine learning and artificial intelligence. Fast matrix computations can facilitate many large-scale computational projects greatly. Basic linear algebra subprograms (BLAS) are proposed, which classify different matrices and provide a standardized interface. Currently, the most commonly used heterogeneous computing platforms are central processing unit (CPU) and graphics processing unit (GPU). At present, BLAS has been implemented on both CPU and GPU. However, due to the different characteristics of algorithms and hardware, a particular matrix method should be designed for a particular processor. It is important to choose the right processor for a particular matrix computation. This paper first briefly reviews the BLAS, and then introduces architecture and optimization methods of CPU and GPU. The effect of different subroutines in BLAS is studied through experiments. Finally, we discuss the reasons and the processor selection scheme of matrix computations.

Full Text