Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi

Athena Elafrou,Georgios Goumas,Nectarios Koziris

doi:10.1109/ipdpsw.2017.134

Abstract

In this paper we propose a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel on the Intel Xeon Phi manycore processor. The architectural differences of such processors compared to their multicore counterparts overly expose inherent structural weaknesses of different sparse matrices, intensifying performance issues beyond the traditionally reported memory bandwidth bottleneck. We thus advocate that matrix adaptivity through runtime specialization is essential to optimizing SpMV on such processors. To this end, we present an approach that first identifies the performance bottlenecks of the kernel for a given sparse matrix either through profiling or by examining comprehensive structural features of the matrix, and then selects suitable optimizations to tackle them. Our optimization pool is based on the widely used Compressed Sparse Row (CSR) sparse matrix storage format and has low preprocessing overheads, making our overall approach practical even in the context of iterative solvers that converge in a small number of iterations. We evaluate our optimizer on Intel’s Knights Corner co-processor and demonstrate that it is able to distinguish and appropriately optimize SpMV for the majority of matrices in a large and diverse test suite, leading to significant speedups over the corresponding CSR implementation available in the latest Intel MKL library.

Full Text