Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Jianlong Zhong,Bingsheng He

doi:10.1109/tpds.2013.257

Abstract

Graphics processors, or GPUs, have recently been widely used as accelerators in shared environments such as clusters and clouds. In such shared environments, many kernels are submitted to GPUs from different users, and throughput is an important metric for performance and total ownership cost. Despite recently improved runtime support for concurrent GPU kernel executions, the GPU can be severely underutilized, resulting in suboptimal throughput. In this paper, we propose Kernelet, a runtime system to improve the throughput of concurrent kernel executions on the GPU. Kernelet embraces transparent memory management and PCI-e data transfer techniques, and dynamic slicing and scheduling techniques for kernel executions. With slicing, Kernelet divides a GPU kernel into multiple sub-kernels (namely slices ). Each slice has tunable occupancy to allow co-scheduling with other slices for high GPU utilization. We develop a novel Markov chain-based performance model to guide the scheduling decision. Our experimental results demonstrate up to 31 percent and 23 percent performance improvement on NVIDIA Tesla C2050 and GTX680 GPUs, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Jun 1, 2014
Citations: 128

Similar Papers

Slicing Executable System-of-Systems Models for Efficient Statistical Verification
Jiyoung Song ... Sangwon Hyun
-
Jiyoung Song, et. al.Jiyoung Song ... Sangwon Hyun
01 May 2019
01 May 2019

Event-aware precise dynamic slicing for automatic debugging of Android applications
Hsu Myat Win ... Yulei Sui
Journal of Systems and Software | VOL. 198
Hsu Myat Win, et. al.Hsu Myat Win ... Yulei Sui
07 Jan 2023
Journal of Systems and Software | VOL. 198

Analyzing and Estimating the Performance of Concurrent Kernels Execution on GPUs
Rommel Cruz ... Esteban Clua
-
Rommel Cruz, et. al.Rommel Cruz ... Esteban Clua
17 Oct 2017
17 Oct 2017

Mandoline: Dynamic Slicing of Android Applications with Trace-Based Alias Analysis
Khaled Ahmed ... Mieszko Lis
-
Khaled Ahmed, et. al.Khaled Ahmed ... Mieszko Lis
01 Apr 2021
01 Apr 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems