Optimizing Tiled Matrix-Matrix Product According to Cache Performance Enhancement

Emna Hammami,Yosr Slama

doi:10.1109/bdcloud.2018.00098

Emna Hammami, Yosr Slama

https://doi.org/10.1109/bdcloud.2018.00098

Copy DOI

Export

Save

Cite

Publication Date: Dec 1, 2018

Abstract
Full-Text
Similar Papers

Abstract

Listen

Loop tiling is an important optimization technique to improve the practical performance of loop nests in terms of data locality. Thus, by reordering loop iterations, accesses to the closest data become within a relatively short timeframe. Nevertheless, the growing variation of tile sizes in a tiled code leads to significant difference in performance as well as completion time. Therefore, an effective selection of optimal tile sizes has become a critical challenge to tiled codes performance in front of the evolution of memory hierarchies in depth and complexity. Past works, which only target old platforms, have pursued two types of algorithms for tile size selection, i.e. analytical and empirical. However, following the recent processor's development, it becomes necessary to adapt such selections in a large space of tiled codes to the new architectures by taking into account the target hardware characteristics such as cache size, cache line size and set-associativity. It is in this context that this paper revisits the tile size selection problem for the matrix-matrix product (MMP) kernel. Indeed, we propose an analytical heuristic to choose optimal tile sizes for MMP aiming at increasing data reuse and reducing the completion time. Further, we carried out a series of experiments targeting a multicore machine to evaluate the interest of our approach, which improves measured execution time when compared to previous works.

Full Text