Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications

Kyungwoon Cho,Hyokyung Bahn

doi:10.3390/app10249121

Abstract

GPGPU (General-Purpose Graphics Processing Unit) consists of hardware resources that can execute tens of thousands of threads simultaneously. However, in reality, the parallelism is limited as resource allocation is performed by the base unit called thread block, which is not managed judiciously in the current GPGPU systems. To schedule threads in GPGPU, a specialized hardware scheduler allocates thread blocks to the computing unit called SM (Stream Multiprocessors) in a Round-Robin manner. Although scheduling in hardware is simple and fast, we observe that the Round-Robin scheduling is not efficient in GPGPU, as it does not consider the workload characteristics of threads and the resource balance among SMs. In this article, we present a new thread block scheduling model that has the ability of analyzing and quantifying the performances of thread block scheduling. We implement our model as a GPGPU scheduling simulator and show that the conventional thread block scheduling provided in GPGPU hardware does not perform well as the workload becomes heavy. Specifically, we observe that the performance degradation of Round-Robin can be eliminated by adopting DFA (Depth First Allocation), which is simple but scalable. Moreover, as our simulator consists of modular forms based on the framework and we publicly open it for other researchers to use, various scheduling policies can be incorporated into our simulator for evaluating the performance of GPGPU schedulers.

Highlights

With the rapid advances in many-core hardware technologies, GPGPU (General-Purpose GPU)has expanded its area from graphical processing to various parallel processing jobs
We implement our model as a GPGPU scheduling simulator and show that the conventional thread block scheduling provided in GPGPU hardware does not perform well as the workload becomes heavy
We presented a scheduler model for the thread block scheduling in GPGPU and

Summary

Introduction

With the rapid advances in many-core hardware technologies, GPGPU (General-Purpose GPU). We observe that Round-Robin is not efficient in GPGPU as it does not consider the workload characteristics of threads and the resource balance among SMs. In this article, we present a new thread block scheduling model that has the ability of analyzing and quantifying the performances of thread block scheduling. The maximum number of threads that can be conventional thread block scheduler provided in GPGPU hardware does not perform well as the executed per SM is heavy. We observe that thehave performance degradation of Round-Robin of threads per not to exceed this limit while allocating thread blocks to [14]. Each thread block consists of at least one example,scheduling even though the resource utilization of is simulator not high, if resourcethe type to be used of is various policies can be incorporated into our forthe evaluating performance. The two types of memory have a tradeoff between latency and capacity so those should be deliberately utilized considering data input size and copy overhead [19,20]

GPGPU and Thread Block Model

Execution

Comparison of thread the thread block allocation between

Evaluation

GPU full resources in

KB totime

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 20, 2020
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Analysis of Thread Block Scheduling Algorithms for General Purpose GPU Systems
Kyungwoon Cho ... Hyokyung Bahn
-
Kyungwoon Cho, et. al.Kyungwoon Cho ... Hyokyung Bahn
08 Dec 2021
08 Dec 2021

Scratchpad Sharing in GPUs
Jayvant Anantpur ... Vishwesh Jatala
ACM Transactions on Architecture and Code Optimization | VOL. 14
Jayvant Anantpur, et. al.Jayvant Anantpur ... Vishwesh Jatala
26 May 2017
ACM Transactions on Architecture and Code Optimization | VOL. 14

Improving GPGPU resource utilization through alternative thread block scheduling
Seokwoo Song ... Minseok Lee
-
Seokwoo Song, et. al.Seokwoo Song ... Minseok Lee
01 Feb 2014
01 Feb 2014

Locality based warp scheduling in GPGPUs
Yang Zhang ... Zuocheng Xing
Future Generation Computer Systems | VOL. 82
Yang Zhang, et. al.Yang Zhang ... Zuocheng Xing
24 Feb 2017
Future Generation Computer Systems | VOL. 82

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences