Abstract
Loop tiling is an important optimization technique to improve the practical performance of loop nests in terms of data locality. Thus, by reordering loop iterations, accesses to the closest data become within a relatively short timeframe. Nevertheless, the growing variation of tile sizes in a tiled code leads to significant difference in performance as well as completion time. Therefore, an effective selection of optimal tile sizes has become a critical challenge to tiled codes performance in front of the evolution of memory hierarchies in depth and complexity. Past works, which only target old platforms, have pursued two types of algorithms for tile size selection, i.e. analytical and empirical. However, following the recent processor's development, it becomes necessary to adapt such selections in a large space of tiled codes to the new architectures by taking into account the target hardware characteristics such as cache size, cache line size and set-associativity. It is in this context that this paper revisits the tile size selection problem for the matrix-matrix product (MMP) kernel. Indeed, we propose an analytical heuristic to choose optimal tile sizes for MMP aiming at increasing data reuse and reducing the completion time. Further, we carried out a series of experiments targeting a multicore machine to evaluate the interest of our approach, which improves measured execution time when compared to previous works.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have