Abstract

The sparse matrix-vector multiplication (SpMV) is a fundamental kernel used in computational science. As a result, the performance of a large number of applications depends on the efficiency of the SpMV. This kernel is, in fact, a bandwidth- limited operation and poses a challenge for optimization when the matrix has an irregular structure. Over the last few years, a large body of research has been devoted to implementing SpMV on throughput-oriented manycore processors. Several sparse matrix formats have been proposed, with different strengths and weaknesses, as well as other alternative optimization strategies such as row reordering.This paper proposes the design of an architecture-aware technique for improving the performance of the SpMV on Graphic Processing Units (GPUs). This optimization is based on a novel heuristic capable of reducing cache memory accesses within hardware-level thread blocks (warps). The technique is designed and implemented using a variation of the sliced ELL sparse format. However, the underlying idea is structure-independent and can be easily adapted to other sparse representations. We tested the proposed architecture-aware optimization on a large set of benchmarks from heterogeneous application domains. The results show consistent improvements for double-precision calculations, an average 9% increase in performance with speedups up to 2.24 over the baseline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call