Toward OS-Level and Device-Level Cooperative Scheduling for Multitasking GPUs

Xinjian Long,Wendong Wang,Xiangyang Gong,Xirong Que,Yaguang Liu

doi:10.1109/access.2020.2983731

Xinjian Long, Wendong Wang + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.2983731

Copy DOI

Abstract

As one of the most popular accelerators, the graphics processing unit (GPU) has been extensively adopted throughout the world. With the burst of new applications and the growing scale of data, co-running applications on limited GPU resources has become increasingly important due to its dramatic improvement in overall system efficiency. Quality of service (QoS) support among concurrent general-purpose GPU (GPGPU) applications is currently one of the most trending research topics. Prior efforts have been focused on providing QoS support either with OS-level or device-level scheduling methods. Each of these scheduling methods possesses pros and cons and may be unable to independently cover all the scheduling cases. In this paper, we propose a cooperative QoS scheduling scheme (C-QoS) that consists of operating-system-level (OS-level) scheduling and device-level scheduling. Our proposed scheme can control the progress of a kernel and provide thorough QoS support for concurrent applications in multitasking GPUs. Due to the accurate resource management of the copy engine and execution engine, C-QoS achieves QoS goals 23.33% more often than state-of-the-art QoS support mechanisms. The results demonstrate that cooperative methods achieve 17.27% higher system utilization than uncooperative methods.

Highlights

Major companies such as Google, Microsoft, and Tesla have adopted graphics processing unit (GPU) to boost rapid advances in burgeoning areas, such as image recognition, speech processing, natural language processing, disease detection, and autonomous driving
To fully use the merits of the two scheduling methods, we further propose cooperative QoS scheduling scheme (C-QoS) to jointly manipulate the two scheduling methods to improve the performance of this cooperative method, and we aim to provide a more thorough QoS support for concurrent GPU applications
Throughout this paper, we focus on the hardware resources utilization within the streaming multiprocessor (SMP), whose improvement may cause an increase in thread-level parallelism (TLP) and higher GPU throughput

Summary

INTRODUCTION

Major companies such as Google, Microsoft, and Tesla have adopted GPUs to boost rapid advances in burgeoning areas, such as image recognition, speech processing, natural language processing, disease detection, and autonomous driving. Researchers have modified the GPU device driver and invoked system call traps and APIs to schedule different types of GPU commands (memory copy, kernel execution, etc.) or reorder the kernels from different applications [1], [4]–[12] These techniques are defined as OS-level scheduling methods in this work. Researchers have proposed techniques [14]–[18] to dynamically partition GPU resources to provide QoS support among concurrent applications in a spatial-multiplexed manner These works focus on either sharing the device resources at a streaming multiprocessor (SMP) granularity or co-running multiple kernels in one single SMP. We propose a cooperative scheduling scheme (C-QoS) using the OS-level and device-level methods, which can provide thorough QoS support for multitasking GPUs and to improve the overall system utilization. These decisions are driven by the concurrent GPGPU applications’ characteristics and the runtime status of the overall system

BACKGROUND

MOTIVATION

SCHEDULING PROBLEM ANALYSIS

C-QoS SCHEDULING STRATEGY

14: Push kj into Kdp

15: Push kj into Krp

VIII. CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 34	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Toward OS-Level and Device-Level Cooperative Scheduling for Multitasking GPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

IEEE 802.16 based last mile broadband wireless military networks with quality of service support
K Wongthavarawat ... A Ganz
-
K Wongthavarawat, et. al.K Wongthavarawat ... A Ganz
20 Jan 2011
IEEE 802.16 based last mile broadband wireless military networks with quality of service support
K Wongthavarawat ... A Ganz

Joint Hybrid Wavelength Conversion and Fiber Delay Line for All-Optical Core Node Modeling and Optimization Using Dynamic QoS Support
Yassine Khlifi ... Fahad Al-Zahrani
IEEE Access | VOL. 9
Yassine Khlifi, et. al.Yassine Khlifi ... Fahad Al-Zahrani
01 Jan 2020
IEEE Access | VOL. 9

QoS Support Mechanisms in WiMAX
Maode Ma ... Jinchang Lu
-
Maode Ma, et. al.Maode Ma ... Jinchang Lu
01 Jan 2009
01 Jan 2009

Enhanced HCCA mechanism for multimedia traffics with QoS support in IEEE 802.11e networks
Yeong-Sheng Chen ... Jong Hyuk Park
Journal of Network and Computer Applications | VOL. 34
Yeong-Sheng Chen, et. al.Yeong-Sheng Chen ... Jong Hyuk Park
01 Oct 2010
Enhanced HCCA mechanism for multimedia traffics with QoS support in IEEE 802.11e networks
Yeong-Sheng Chen ... Jong Hyuk Park

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward OS-Level and Device-Level Cooperative Scheduling for Multitasking GPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access