Abstract

Loop tiling is an effective optimization to improve performance of multiply nested loops, which are the most time-consuming parts in many programs. Most massively parallel systems today are organized hierarchically, and different levels of the hierarchy differ in the organization of parallelism and the memory models they adopt. To make better use of these machines, it is clear that loop nests should be tiled hierarchically to fit the hierarchical organization of the machine; however, it is not so clear what should be the exact form of these hierarchical tiles. In particular, tile shape selection is of critical importance to expose parallelism of the tiled loop nests. Although loop tiling is a well-known optimization, not much is known about tile shape selection. In this article, we study tile shape selection when the shapes are any type of parallelograms and introduce a model to relate the tile shape of the hierarchy to the execution time. Using this model, we implement a system that automatically finds the tile shapes that minimize the execution time in a hierarchical system. Our experimental results show that in several cases, the tiles automatically selected by our system outperform the most intuitive tiling schemes usually adopted by programmers because of their simplicity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call