An Efficient Temporal Data Prefetcher for L1 Caches

Mohammad Bakhshalipour,Pejman Lotfi-Kamran,Hamid Sarbazi-Azad

doi:10.1109/lca.2017.2654347

Abstract

Server workloads frequently encounter L1-D cache misses, and hence, lose significant performance potential. One way to reduce the number of L1-D misses or their effect is data prefetching. As L1-D access sequences have high temporal correlations, temporal prefetching techniques are promising for L1 caches. State-of-the-art temporal prefetching techniques are effective at reducing the number of L1-D misses, but we observe that there is a significant gap between what they offer and the opportunity. This work aims to improve the effectiveness of temporal prefetching techniques. To overcome the deficiencies of existing temporal prefetchers, we introduce Domino prefetching. Domino prefetcher is a temporal prefetching technique that looks up the history to find the last occurrence of the last one or two L1-D miss addresses for prefetching. We show that Domino prefetcher captures more than 87 percent of the temporal opportunity at L1-D. Through evaluation of a 16-core processor on a set of server workloads, we show that Domino prefetcher improves system performance by 26 percent (up to 56 percent).

Full Text