FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization

Wei Han,Chen Tian,Bo Wu,Lin Ma,Daniel Mawhirter

doi:10.1007/978-3-030-72789-5_3

Abstract

AbstractA modern GPU integrates tens of streaming multi-processors (SMs) on the chip. When used in data centers, the GPUs often suffer from under-utilization for exclusive access reservations, hence demanding multitasking (i.e., co-running applications) to reduce the total cost of ownership. However, latency-critical applications may experience too much interference to meet Quality-of-Service (QoS) targets. In this paper, we propose a software system, FLARE, to spatially share commodity GPUs between latency-critical applications and best-effort applications to enforce QoS as well as maximize overall throughput. By transforming the kernels of best-effort applications, FLARE enables both SM partitioning and thread block partitioning within an SM for co-running applications. It uses a microbenchmark guided static configuration search combined with online dynamic search to locate the optimal (near-optimal) strategy to partition resources. Evaluated on 11 benchmarks and 2 real-world applications, FLARE improves hardware utilization by an average of 1.39X compared to the preemption-based approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization
Pu Pang ... Quan Chen
-
Pu Pang, et. al.Pu Pang ... Quan Chen
21 Jun 2023
21 Jun 2023

Mitigating GPU Core Partitioning Performance Effects
Aaron Barnes ... Timothy G Rogers
-
Aaron Barnes, et. al.Aaron Barnes ... Timothy G Rogers
01 Feb 2023
01 Feb 2023

PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-Memory
Shuang Chen ... Jose F Martinez
-
Shuang Chen, et. al.Shuang Chen ... Jose F Martinez
01 Apr 2022
01 Apr 2022

Warp-level divergence in GPUs: Characterization, impact, and mitigation
Ping Xiang ... Huiyang Zhou
-
Ping Xiang, et. al.Ping Xiang ... Huiyang Zhou
01 Feb 2014
01 Feb 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization

Abstract

Talk to us

Similar Papers