An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization

Kyle Rupnow,Deming Chen,Muhammad Teguh Satria,Yun Liang

doi:10.1109/tcad.2015.2501303

Abstract

Graphic processing units (GPUs) are composed of a group of single-instruction multiple data (SIMD) streaming multiprocessors (SMs). GPUs are able to efficiently execute highly data parallel tasks through SIMD execution on the SMs. However, if those threads take diverging control paths, all divergent paths are executed serially. In the worst case, every thread takes a different control path and the highly parallel architecture is used serially by each thread. This control flow divergence problem is well known in GPU development; code transformation, memory access redirection, and data layout reorganization are commonly used to reduce the impact of divergence. These techniques attempt to eliminate divergence by grouping together threads or data to ensure identical behavior. However, prior efforts using these techniques do not model the performance impact of any particular divergence or consider that complete elimination of divergence may not be possible. Thus, we perform analysis of the performance impact of divergence and potential thread regrouping algorithms that eliminate divergence or minimize the impact of remaining divergence. Finally, we develop a divergence optimization framework that analyzes and transforms the kernel at compile-time and regroups the threads at runtime. For the compute-bound applications, our proposed metrics achieve performance estimation accuracy within 6.2% of measured performance. Using these metrics, we develop thread regrouping algorithms, which consider the impact of divergence, and speed up these applications by $2.2 {\times }$ on average on NVIDIA GTX480.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Jul 1, 2016
Citations: 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems