Abstract

As one of the most popular accelerators, the graphics processing unit (GPU) has been extensively adopted throughout the world. With the burst of new applications and the growing scale of data, co-running applications on limited GPU resources has become increasingly important due to its dramatic improvement in overall system efficiency. Quality of service (QoS) support among concurrent general-purpose GPU (GPGPU) applications is currently one of the most trending research topics. Prior efforts have been focused on providing QoS support either with OS-level or device-level scheduling methods. Each of these scheduling methods possesses pros and cons and may be unable to independently cover all the scheduling cases. In this paper, we propose a cooperative QoS scheduling scheme (C-QoS) that consists of operating-system-level (OS-level) scheduling and device-level scheduling. Our proposed scheme can control the progress of a kernel and provide thorough QoS support for concurrent applications in multitasking GPUs. Due to the accurate resource management of the copy engine and execution engine, C-QoS achieves QoS goals 23.33% more often than state-of-the-art QoS support mechanisms. The results demonstrate that cooperative methods achieve 17.27% higher system utilization than uncooperative methods.

Highlights

  • Major companies such as Google, Microsoft, and Tesla have adopted graphics processing unit (GPU) to boost rapid advances in burgeoning areas, such as image recognition, speech processing, natural language processing, disease detection, and autonomous driving

  • To fully use the merits of the two scheduling methods, we further propose cooperative QoS scheduling scheme (C-QoS) to jointly manipulate the two scheduling methods to improve the performance of this cooperative method, and we aim to provide a more thorough QoS support for concurrent GPU applications

  • Throughout this paper, we focus on the hardware resources utilization within the streaming multiprocessor (SMP), whose improvement may cause an increase in thread-level parallelism (TLP) and higher GPU throughput

Read more

Summary

INTRODUCTION

Major companies such as Google, Microsoft, and Tesla have adopted GPUs to boost rapid advances in burgeoning areas, such as image recognition, speech processing, natural language processing, disease detection, and autonomous driving. Researchers have modified the GPU device driver and invoked system call traps and APIs to schedule different types of GPU commands (memory copy, kernel execution, etc.) or reorder the kernels from different applications [1], [4]–[12] These techniques are defined as OS-level scheduling methods in this work. Researchers have proposed techniques [14]–[18] to dynamically partition GPU resources to provide QoS support among concurrent applications in a spatial-multiplexed manner These works focus on either sharing the device resources at a streaming multiprocessor (SMP) granularity or co-running multiple kernels in one single SMP. We propose a cooperative scheduling scheme (C-QoS) using the OS-level and device-level methods, which can provide thorough QoS support for multitasking GPUs and to improve the overall system utilization. These decisions are driven by the concurrent GPGPU applications’ characteristics and the runtime status of the overall system

BACKGROUND
MOTIVATION
SCHEDULING PROBLEM ANALYSIS
C-QoS SCHEDULING STRATEGY
14: Push kj into Kdp
15: Push kj into Krp
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.