Abstract

We present a novel, compile-time method for determining the cache performance of the loop nests in a program. The cache hit-rates are produced by applying the reference string, determined during compilation, to an architecturally parameterized cache simulator. We also describe a heuristic that uses this method for compile-time optimization of loop ranges in iteration-space blocking. The results of the loop program optimizations are presented for different parallel program benchmarks and various processor architectures, such as IBM SP1 RS/6000, the SuperSPARC, and the Intel 1860.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call