An Adaptive Framework for Oversubscription Management in CPU-GPU Unified Memory

Debashis Ganguly,Jun Yang,Rami Melhem

doi:10.23919/date51398.2021.9473982

Abstract

Hardware support for fault-driven page migration and on-demand memory allocation along with the advancements in unified memory runtime in modern graphics processing units (GPUs) simplify the memory management in discrete CPU-GPU heterogeneous memory systems and ensure higher programmability. GPUs adopt to accelerate general purpose applications as they are now an integral part of heterogeneous computing platforms ranging from supercomputers to commodity cloud platforms. However, data-intensive applications face the challenge of device-memory oversubscription as the limited capacity of bandwidth-optimized GPU memory fails to accommodate their increasing working sets. Performance overhead under memory oversubscription comes from the thrashing of memory pages over slow CPU-GPU interconnect. Depending on the diverse computing and memory access pattern, each application demands special attention from memory management. As a result, the responsibility of effectively utilizing the plethora of memory management techniques supported by GPU programming libraries and runtime falls squarely on the application programmer. This paper presents a smart runtime that leverages the faults and page-migration information to detect underlying patterns in CPU-GPU interconnect traffic. Based on the online workload characterization, the extended unified memory runtime dynamically chooses and employs a suitable policy from a wide array of memory management strategies to address the issues with memory oversubscription. Experimental evaluation shows that this smart adaptive runtime provides 18% and 30% (geometric mean) performance improvement across all benchmarks compared to the default unified memory runtime under 125% and 150% device memory oversubscription, respectively.

Full Text