PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

John Ravi,Huiyang Zhou,Michela Becchi,Tri Nguyen

doi:10.1109/hipc53243.2021.00063

Abstract

Concurrent kernel execution on GPU has proven an effective technique to improve system throughput by maximizing the resource utilization. In order to increase programmability and meet the increasing memory requirements of data-intensive applications, current GPUs support Unified Virtual Memory (UVM), which provides a virtual memory abstraction with demand paging. By allowing applications to oversubscribe GPU memory, UVM provides increased opportunities to share GPU resources across applications. However, in the presence of applications with competing memory requirements, GPU sharing can lead to performance degradation due to thrashing. NVIDIA's Multiple Process Service (MPS) offers the capability to space share bare metal GPUs, thereby enabling cluster workload managers, such as Slurm, to share a single GPU across MPI ranks with limited control over resource partitioning. However, it is not possible to preempt, schedule, or throttle a running GPU process through MPS. These features would enable new OS-managed scheduling policies to be implemented for GPU kernels to dynamically handle resource contention and offer consistent performance. The contribution of this paper is two-fold. We first show how memory oversubscription can impact the performance of concurrent GPU applications. Then, we propose three methods to transparently mitigate memory interference through kernel preemption and scheduling policies. To implement our policies, we develop our own runtime system (PILOT) to serve as an alternative to NVIDIA's MPS. In the presence of memory over-subscription, we noticed a dramatic improvement in the overall throughput when using our scheduling policies and runtime hints.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis
Tyler Allen ... Rong Ge
-
Tyler Allen, et. al.Tyler Allen ... Rong Ge
01 May 2021
01 May 2021

CRUM: Checkpoint-Restart Support for CUDA's Unified Memory
Rohan Garg ... Michael Sullivan
-
Rohan Garg, et. al.Rohan Garg ... Michael Sullivan
01 Sep 2018
01 Sep 2018

BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription
Zhao-Wei Qiu ... Kun-Sheng Liu
IEEE Transactions on Parallel and Distributed Systems | VOL. 33
Zhao-Wei Qiu, et. al.Zhao-Wei Qiu ... Kun-Sheng Liu
01 Dec 2022
IEEE Transactions on Parallel and Distributed Systems | VOL. 33

Multithreaded virtual-memory-enabled reconfigurable hardware accelerators
Miljan Vuletic ... Walter Stechele
-
Miljan Vuletic, et. al.Miljan Vuletic ... Walter Stechele
01 Dec 2006
01 Dec 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

Abstract

Talk to us

Similar Papers