Software prefetch on core micro-architecture applied to irregular codes

Samir Ammenouche,Jesus Carretero,William Jalby,David E Singh

doi:10.1109/hpcsim.2011.5999833

Abstract

Sparse computations constitute one of the most important areas of numerical algebra and scientific computing. Be cause of indirect addressing, sparse codes exhibit irregular patterns of references to memory. While there are many studies performing high level optimizations on sparse computing, few deal with software prefetch. This is due to the irregular memory accesses which are incompatible with regular prefetch and it is due to the high efficiency and complexity of hardware prefetch units included in modern processors, like the Intel Core micro-architecture. In this paper, we show the efficiency and the limitations of hardware prefetch units, and we propose a technique to use software prefetch instructions in combination with hardware support to better manage cache and improve the overall code performance. To achieve this goal, the cache behavior of the sparse matrix vector multiplication (SpMV) is analyzed focusing on the code structure and the sequence order of the data. Main cache parameters are identified and their impact on the cache performance is evaluated. These parameters are included in a matrix analyzer to determine in advance the efficiency of the software prefetch. Furthermore, the software prefetch efficiency is analyzed on a large set of sparse matrices. Experimental results show an accurate prediction of the matrix analyzer and a maximum improvement of 40% in the execution time.

Full Text