Abstract

Matrix computing is the core component of machine learning and artificial intelligence. Fast matrix computations can facilitate many large-scale computational projects greatly. Basic linear algebra subprograms (BLAS) are proposed, which classify different matrices and provide a standardized interface. Currently, the most commonly used heterogeneous computing platforms are central processing unit (CPU) and graphics processing unit (GPU). At present, BLAS has been implemented on both CPU and GPU. However, due to the different characteristics of algorithms and hardware, a particular matrix method should be designed for a particular processor. It is important to choose the right processor for a particular matrix computation. This paper first briefly reviews the BLAS, and then introduces architecture and optimization methods of CPU and GPU. The effect of different subroutines in BLAS is studied through experiments. Finally, we discuss the reasons and the processor selection scheme of matrix computations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call