Abstract
The sparse matrix-vector multiplication (SpMV) is a fundamental kernel used in computational science. As a result, the performance of a large number of applications depends on the efficiency of the SpMV. This kernel is, in fact, a bandwidth- limited operation and poses a challenge for optimization when the matrix has an irregular structure. Over the last few years, a large body of research has been devoted to implementing SpMV on throughput-oriented manycore processors. Several sparse matrix formats have been proposed, with different strengths and weaknesses, as well as other alternative optimization strategies such as row reordering.This paper proposes the design of an architecture-aware technique for improving the performance of the SpMV on Graphic Processing Units (GPUs). This optimization is based on a novel heuristic capable of reducing cache memory accesses within hardware-level thread blocks (warps). The technique is designed and implemented using a variation of the sliced ELL sparse format. However, the underlying idea is structure-independent and can be easily adapted to other sparse representations. We tested the proposed architecture-aware optimization on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision calculations, an average 9% increase in performance with speedups up to 2.24 over the baseline.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.