Abstract

SummarySparse matrix‐vector multiplication (SpMV) is an essential kernel in sparse linear algebra and has been studied extensively on all modern processor and accelerator architectures. Compressed Sparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR‐based SpMV has poor performance on processors with vector units. In order to take full advantage of SIMD acceleration technology in SpMV, we proposed a new matrix storage format called CSR‐SIMD. The new storage format compresses the non‐zero elements into many variable‐length data fragments with consecutive memory access addresses. Thus, the data locality of sparse matrix A and dense vector x expands and the floating‐point operations for each fragment can be completely calculated by vectorized implementation on wide SIMD units. Our experimental results indicate that CSR‐SIMD has better storage efficiency and low‐overhead for format conversion. Besides, the new format achieves high scalability on wide SIMD units. In comparison with the CSR‐based and BCSR‐based SpMV, CSR‐SIMD obtains better performance on FT1500A, Intel Xeon, and Intel Xeon Phi.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call