Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

O Mutlu,Y.N Patt,Hyesoon Kim Hyesoon Kim

doi:10.1109/mm.2006.10

Abstract

Today's high-performance processors face main-memory latencies on the order of hundreds of processor clock cycles. As a result, even the most aggressive processors spend a significant portion of their execution time stalling and waiting for main-memory accesses to return data to the execution core. Runahead execution is a promising way to tolerate long main-memory latencies because it has modest hardware cost and doesn't significantly increase processor complexity. Runahead execution improves a processors performance by speculatively pre-executing the application program while the processor services a long-latency (1,2) data cache miss, instead of stalling the processor for the duration of the L2 miss. For runahead execution to be efficiently implemented in current or future high-performance processors which will be energy-constrained, processor designers must develop techniques to reduce these extra instructions. Our solution to this problem includes both hardware and software mechanisms that are simple, implementable, and effective

Full Text