Joint direct and transposed sparse matrix‐vector multiplication for multithreaded CPUs

Claudio Kozický,Ivan Šimeček

doi:10.1002/cpe.6236

Abstract

AbstractRepeatedly performing sparse matrix‐vector multiplication (SpMV) followed by transposed sparse matrix‐vector multiplication (SpMTV) with the same matrix is a part of several algorithms, for example, the Lanczos biorthogonalization algorithm and the biconjugate gradient method. Such algorithms can benefit from combining parallel SpMV and SpMTV into a single operation we call joint direct and transposed sparse matrix‐vector multiplication (SpMMTV). In this article, we present a parallel SpMMTV algorithm for shared‐memory CPUs. The algorithm uses a sparse matrix format that divides the stored matrix into sparse matrix blocks and compresses the row and column indices of the matrix. This sparse matrix format can be also used for SpMV, SpMTV, and similar sparse matrix‐vector operations. We expand upon existing research by suggesting new variants of the parallel SpMMTV algorithm and by extending the algorithm to efficiently support symmetric matrices. We compare the performance of the presented parallel SpMMTV algorithm with alternative approaches, which use state‐of‐the‐art sparse matrix formats and libraries, using sparse matrices from real‐world applications. The performance results indicate that the median performance of our proposed parallel SpMMTV algorithm is up to 45% higher than of the alternative approaches.

Full Text