Square Matrix Multiplication Using CUDA on GP-GU

Ali Olow Jimale,Fakhitah Ridzuan,Wan Mohd Nazmee Wan Zainon

doi:10.1016/j.procs.2019.11.138

Ali Olow Jimale, Fakhitah Ridzuan + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2019.11.138

Copy DOI

Journal: Procedia computer science	Publication Date: Jan 1, 2019
Citations: 1	License type: cc-by-nc-nd

Affiliation: Universiti Sains Malaysia

Abstract

This paper focuses on matrix multiplication algorithm, particularly square parallel matrix multiplication using Computer Unified Device Architecture (CUDA) programming model with C programming language. Matrix multiplication is under the list of time-consuming problems that require s huge computational resources to improve its speedup. As many studies have shown, it is not easy to achieve high performance speedup in sequential matrix multiplication algorithm using larger input. The emphasis of this study is to propose a parallel algorithm to calculate the product of two square matrices with improved speedup performance compared to the sequential and OpenMP algorithms. In this research, biruni (super machine workstation) in the School of Computer Sciences, USM, Malaysia with General Purpose Graphics Processing Unit (GP-GU) was used to parallelize the matrix product algorithm. A comparison between parallel OpenMp versions and sequential algorithm with the proposed CUDA based algorithm of this research was carried out to evaluate the speedup performance of the proposed parallel CUDA based algorithm. The overall results show that CUDA based parallel matrix multiplication is approximately 400 times faster than sequential matrix multiplication and 4 times faster than OpenMp matrix multiplication algorithms, respectively. Therefore, the proposed parallel algorithm can help the researchers working with matrix multiplication application problems. It can also help mathematicians to easily calculate the product of any two matrices and obtain the result in a shorter time.

Full Text