Abstract

This paper focuses on matrix multiplication algorithm, particularly square parallel matrix multiplication using Computer Unified Device Architecture (CUDA) programming model with C programming language. Matrix multiplication is under the list of time-consuming problems that require s huge computational resources to improve its speedup. As many studies have shown, it is not easy to achieve high performance speedup in sequential matrix multiplication algorithm using larger input. The emphasis of this study is to propose a parallel algorithm to calculate the product of two square matrices with improved speedup performance compared to the sequential and OpenMP algorithms. In this research, biruni (super machine workstation) in the School of Computer Sciences, USM, Malaysia with General Purpose Graphics Processing Unit (GP-GU) was used to parallelize the matrix product algorithm. A comparison between parallel OpenMp versions and sequential algorithm with the proposed CUDA based algorithm of this research was carried out to evaluate the speedup performance of the proposed parallel CUDA based algorithm. The overall results show that CUDA based parallel matrix multiplication is approximately 400 times faster than sequential matrix multiplication and 4 times faster than OpenMp matrix multiplication algorithms, respectively. Therefore, the proposed parallel algorithm can help the researchers working with matrix multiplication application problems. It can also help mathematicians to easily calculate the product of any two matrices and obtain the result in a shorter time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call