Abstract
Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality.The performance of the memory hierarchy can be improved by means of data and loop transformations. Tiling is a loop transformation that aims at reducing capacity misses by shortening the reuse distance. Padding is a data layout transformation targeted to reduce conflict misses.This article presents an accurate cost model that describes misses across different hierarchy levels and considers the effects of other hardware components such as branch predictors. The cost model drives the application of tiling and padding transformations. We combine the cost model with a genetic algorithm to compute the tile and pad factors that enhance the program performance.To validate our strategy, we ran experiments for a set of benchmarks on a large set of modern architectures. Our results show that this scheme is useful to optimize programs' performance. When compared to previous approaches, we observe that with a reasonable compile-time overhead, our approach gives significant performance improvements for all studied kernels on all architectures.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Programming Languages and Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.