Abstract

Sparse computations constitute one of the most important areas of numerical algebra and scientific computing. Be cause of indirect addressing, sparse codes exhibit irregular patterns of references to memory. While there are many studies performing high level optimizations on sparse computing, few deal with software prefetch. This is due to the irregular memory accesses which are incompatible with regular prefetch and it is due to the high efficiency and complexity of hardware prefetch units included in modern processors, like the Intel Core micro-architecture. In this paper, we show the efficiency and the limitations of hardware prefetch units, and we propose a technique to use software prefetch instructions in combination with hardware support to better manage cache and improve the overall code performance. To achieve this goal, the cache behavior of the sparse matrix vector multiplication (SpMV) is analyzed focusing on the code structure and the sequence order of the data. Main cache parameters are identified and their impact on the cache performance is evaluated. These parameters are included in a matrix analyzer to determine in advance the efficiency of the software prefetch. Furthermore, the software prefetch efficiency is analyzed on a large set of sparse matrices. Experimental results show an accurate prediction of the matrix analyzer and a maximum improvement of 40% in the execution time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.