Abstract
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today’s multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation that eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance- k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially. By means of a parameterized roofline model, we analyze the SymmSpMV performance in detail and discuss outliers. While we focus on SymmSpMV in this article, our algorithm and software are applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring.
Highlights
AND RELATED WORKThe efficient solution of linear systems or eigenvalue problems involving large sparse matrices has been an active research field in parallel and high-performance computing for many decades
While we focus on SymmSpMV in this article, our algorithm and software are applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring
Before we evaluate the performance across the full set of matrices presented in Table 2, we return to the analysis of the SymmSpMV performance and data traffic for the Spin-26 matrix that we have presented in Section 3.3 for the established coloring approaches
Summary
The efficient solution of linear systems or eigenvalue problems involving large sparse matrices has been an active research field in parallel and high-performance computing for many decades. The solvers are typically based on iterative subspace methods and may include advanced preconditioning techniques. Two components, sparse matrix-vector multiplication (SpMV) and coloring techniques, are crucial for hardware efficiency and parallel scalability. These two components are considered to be orthogonal, i.e., hardware efficiency for SpMV is mainly related to data formats and local structures, while coloring is used to address dependencies in the enclosing iteration scheme. The hardware-efficient parallelization of symmetric SpMV requires to handle both these aspects efficiently
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.