Abstract

The adoption of unified memory and demand paging has simplified programming and eased memory management in discrete GPUs. However, long-latency page faults cause significant performance overhead. While several software-based mechanisms have been proposed to address this issue, they suffer from inefficiency when page prefetching and pre-eviction are combined. For example, a state-of-the-art page replacement policy, hierarchical page eviction (HPE), is inefficient when prefetching is enabled. Furthermore, the prefetcher semantics-aware pre-evicting policy, which pre-evicts continuous pages in bulk the way they were brought in by the prefetcher, may cause thrashing for some irregular applications.In this paper, coordinated page prefetch and eviction (CPPE) is proposed to manage memory oversubscription in GPUs with unified memory. CPPE incorporates a modified page eviction policy, MHPE, and an access pattern-aware prefetcher in a fine-grained manner: MHPE is aware of prefetch semantics and the prefetcher prefetches pages according to access patterns in eviction candidates selected by MHPE. Simulation results show that, when the GPU memory is 75% and 50% oversubscribed, CPPE achieves an average speedup of 1.56x and 1.64x (up to 10.97x) over the state-of-the-art baseline, which combines a sequential-local prefetcher and LRU pre-eviction policy. CPPE also outperforms other approaches, including Random/reserved LRU with the sequential-local prefetcher, and simply disabling prefetching under memory oversubscription.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call