Abstract
The sparse matrix vector multiplication (SpMV) is inevitable in almost all kinds of scientific computation, such as iterative methods for solving linear systems and eigenvalue problems. With the emergence and development of Graphics Processing Units (GPUs), high efficient formats for SpMV should be constructed. The performance of SpMV is mainly determinted by the storage format for sparse matrix. Based on the idea of JAD format, this paper improved the ELLPACK-R format, reduced the waiting time between different threads in a warp, and the speed up achieved about 1.5 in our experimental results. Compared with other formats, such as CSR, ELL, BiELL and so on, our format performance of SpMV is optimal over 70 percent of the test matrix. We proposed a method based on parameters to analyze the performance impact on different formats. In addition, a formula was constructed to count the computation and the number of iterations.
Highlights
The Sparse matrix vector multiplication (SpMV) is a key operation in for a variety of computation science, such as in many iterative methods for solving linear systems ( Ax = b ), image processing, simulation and so on
There are many storage formats related to sparse matrix, such as compressed sparse row (CSR), ELL, hybrid format (HYB), BiELL and so on
The calculation of sparse matrix vector multiplication (SpMV) based on COO format is not suitable for Graphics Processing Units (GPUs) structure when the matrix is stored in disorder
Summary
GPU including many Stream Processors, and many threads can simultaneously calculate multiple groups of data, with high computational power and very high memory bandwidth. In order to improve the computational efficiency, it is important to make changes to find a suitable matrix storage format and calculation method. There are many storage formats related to sparse matrix, such as CSR, ELL, HYB, BiELL and so on. In [5] we can see ELL performance for the structured matrices because it has continuous access to memory. The ELLPACK-R format presented in [6] is optimized to reduce the waiting time between different threads.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.