Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs

Qi Yu,Zhiying Wang,Hui Guo,Libo Huang,Bruce Childers,Cheng Qian

doi:10.1109/ipdps47924.2020.00056

Abstract

The adoption of unified memory and demand paging has simplified programming and eased memory management in discrete GPUs. However, long-latency page faults cause significant performance overhead. While several software-based mechanisms have been proposed to address this issue, they suffer from inefficiency when page prefetching and pre-eviction are combined. For example, a state-of-the-art page replacement policy, hierarchical page eviction (HPE), is inefficient when prefetching is enabled. Furthermore, the prefetcher semantics-aware pre-evicting policy, which pre-evicts continuous pages in bulk the way they were brought in by the prefetcher, may cause thrashing for some irregular applications.In this paper, coordinated page prefetch and eviction (CPPE) is proposed to manage memory oversubscription in GPUs with unified memory. CPPE incorporates a modified page eviction policy, MHPE, and an access pattern-aware prefetcher in a fine-grained manner: MHPE is aware of prefetch semantics and the prefetcher prefetches pages according to access patterns in eviction candidates selected by MHPE. Simulation results show that, when the GPU memory is 75% and 50% oversubscribed, CPPE achieves an average speedup of 1.56x and 1.64x (up to 10.97x) over the state-of-the-art baseline, which combines a sequential-local prefetcher and LRU pre-eviction policy. CPPE also outperforms other approaches, including Random/reserved LRU with the sequential-local prefetcher, and simply disabling prefetching under memory oversubscription.

Full Text