Breaking the Address Translation Wall by Accelerating Memory Replays

Abhishek Bhattacharjee

doi:10.1109/mm.2018.032271063

Abstract

Emerging software stacks continue to process ever-increasing amounts of data, posing a performance challenge to the virtual memory layer of modern computer systems. In particular, address translation is now an acute system performance bottleneck. In response, we propose a class of cache prefetchers triggered by page table walk (PTW) activity. Our scheme-translation-enabled memory prefetching optimizations (TEMPO)-hinges on two observations. First, a substantial fraction of DRAM references in modern big-data workloads are devoted to accessing page tables (PTs). Second, when memory references require PT lookups in DRAM, the majority of them also look up DRAM for the subsequent data access. TEMPO exploits these observations to enable cache prefetching of the data pointed to by the PT. TEMPO requires only modest changes to hardware and no OS or application-level changes. Overall, TEMPO improves performance by 10-30 percent and energy by 1-14 percent.

Full Text