Band-Pass Prefetching

Aswinkumar Sridharan,Andre Seznec,Biswabandan Panda

doi:10.1145/3090635

Abstract

In multi-core systems, an application’s prefetcher can interfere with the memory requests of other applications using the shared resources, such as last level cache and memory bandwidth. In order to minimize prefetcher-caused interference, prior mechanisms have been proposed to dynamically control prefetcher aggressiveness at runtime. These mechanisms use several parameters to capture prefetch usefulness as well as prefetcher-caused interference, performing aggressive control decisions. However, these mechanisms do not capture the actual interference at the shared resources and most often lead to incorrect aggressiveness control decisions. Therefore, prior works leave scope for performance improvement. Toward this end, we propose a solution to manage prefetching in multicore systems. In particular, we make two fundamental observations: First, a positive correlation exists between the accuracy of a prefetcher and the amount of prefetch requests it generates relative to an application’s total (demand and prefetch) requests. Second, a strong positive correlation exists between the ratio of total prefetch to demand requests and the ratio of average last level cache miss service times of demand to prefetch requests. In this article, we propose Band-pass prefetching that builds on those two observations, a simple and low-overhead mechanism to effectively manage prefetchers in multicore systems. Our solution consists of local and global prefetcher aggressiveness control components, which altogether, control the flow of prefetch requests between a range of prefetch to demand requests ratios. From our experiments on 16-core multi-programmed workloads, on systems using stream prefetching, we observe that Band-pass prefetching achieves 12.4% (geometric-mean) improvement on harmonic speedup over the baseline that implements no prefetching, while aggressive prefetching without prefetcher aggressiveness control and state-of-the-art HPAC, P-FST, and CAFFEINE achieve 8.2%, 8.4%, 1.4%, and 9.7%, respectively. Further evaluation of the proposed Band-pass prefetching mechanism on systems using AMPM prefetcher shows similar performance trends. For a 16-core system, Band-pass prefetching requires only a modest hardware cost of 239 bytes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Band-Pass Prefetching

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization

Lead the way for us

Journal: ACM Transactions on Architecture and Code Optimization	Publication Date: Jun 28, 2017
Citations: 7

Similar Papers

Coordinated control of multiple prefetchers in multi-core systems
Eiman Ebrahimi ... Chang Joo Lee
-
Eiman Ebrahimi, et. al.Eiman Ebrahimi ... Chang Joo Lee
12 Dec 2009
12 Dec 2009

Prefetch-Aware DRAM Controllers
Chang Joo Lee ... Onur Mutlu
-
Chang Joo Lee, et. al.Chang Joo Lee ... Onur Mutlu
01 Nov 2008
01 Nov 2008

Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor
André Aziz ... Edna Barros
-
André Aziz, et. al.André Aziz ... Edna Barros
01 Sep 2014
01 Sep 2014

Coordinating prefetching and STT-RAM based last-level cache management for multicore systems
Mengjie Mao ... Yiran Chen
-
Mengjie Mao, et. al.Mengjie Mao ... Yiran Chen
02 May 2013
02 May 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Band-Pass Prefetching

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization