Abstract

SummarySparse matrix‐vector multiplication (SpMV) is an essential kernel in sparse linear algebra and has been studied extensively on all modern processor and accelerator architectures. Compressed Sparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR‐based SpMV has poor performance on processors with vector units. In order to take full advantage of SIMD acceleration technology in SpMV, we proposed a new matrix storage format called CSR‐SIMD. The new storage format compresses the non‐zero elements into many variable‐length data fragments with consecutive memory access addresses. Thus, the data locality of sparse matrix A and dense vector x expands and the floating‐point operations for each fragment can be completely calculated by vectorized implementation on wide SIMD units. Our experimental results indicate that CSR‐SIMD has better storage efficiency and low‐overhead for format conversion. Besides, the new format achieves high scalability on wide SIMD units. In comparison with the CSR‐based and BCSR‐based SpMV, CSR‐SIMD obtains better performance on FT1500A, Intel Xeon, and Intel Xeon Phi.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.