Abstract

AbstractWe suggest a technique to reduce the storage size of sparse matrices at no loss of information. We call this technique Diagonally‐Addressed (DA) storage. It exploits the typically low matrix bandwidth of matrices arising in applications. For memory‐bound algorithms, this traffic reduction has direct benefits for both uni‐precision and multi‐precision algorithms. In particular, we demonstrate how to apply DA storage to the Compressed Sparse Rows (CSR) format and compare the performance in computing the Sparse Matrix Vector (SpMV) product, which is a basic building block of many iterative algorithms. We investigate 1367 matrices from the SuiteSparse Matrix Collection fitting into the CSR format using signed 32 bit indices. More than 95% of these matrices fit into the DA‐CSR format using 16 bit column indices, potentially after Reverse Cuthill‐McKee (RCM) reordering. Using IEEE 754 precision scalars, we observe a performance uplift of 11% (single‐threaded) or 17.5% (multithreaded) on average when the traffic exceeds the size of the last‐level CPU cache. The predicted uplift in this scenario is 20%. For traffic within the CPU's combined level 2 and level 3 caches, the multithreaded performance uplift is over 40% for a few test matrices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call