Abstract

Despite the success of hybrid data address and value prediction in increasing the accuracy and coverage of data prefetching, memory access latency is still found to be an important bottleneck to the system performance. Careful study shows that about half of the cache misses are actually due to data references whose access pattern can be predicted accurately. Furthermore, the overall cache effectiveness is bounded by the behavior of unpredictable data references in cache. In this paper, we propose a set of four load-balancing techniques to address this memory latency problem. The first two mechanisms, sequential unification and aggressive lookahead mechanisms, are mainly used to reduce the chance of partial hits and the abortion of accurate prefetch requests. The latter two mechanisms, default prefetching and cache partitioning mechanisms, are used to optimize the cache performance of unpredictable references. The resulting cache, called the LBD ( load-balancing data) cache, is found to have superior performance over a wide range of applications. Simulation of the LBD cache with RPT prefetching (reference prediction table — one of the most cited selective data prefetch schemes proposed by Chen and Baer) on SPEC95 showed that significant reduction in data reference latency, ranging from about 20 to over 90% and with an average of 55.89%, can be obtained. This is compared against the performance of prefetch-on-miss and RPT, with an average latency reduction of only 17.37 and 26.05%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call