Pointer-Based Divergence Analysis for OpenCL 2.0 Programs

Shao-Chung Wang,Yuan-Shin Hwang,Li-An Her,Jenq-Kuen Lee,Lin-Ya Yu

doi:10.1145/3470644

Abstract

A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This GPU architecture is suitable for applications with a high degree of data parallelism, but its performance degrades seriously when divergence occurs. Many optimizations for divergence have been proposed, and they vary with the divergence information about variables and branches. A previous analysis scheme viewed pointers and return values from functions as divergence directly, and only focused on OpenCL 1.x. In this article, we present a novel scheme that reports the divergence information for pointer-intensive OpenCL programs. The approach is based on extended static single assignment (SSA) and adds some special functions and annotations from memory SSA and gated SSA. The proposed scheme first constructs extended SSA, which is then used to build a divergence relation graph that includes all of the possible points-to relationships of the pointers and initialized divergence states. The divergence state of the pointers can be determined by propagating the divergence state of the divergence relation graph. The scheme is further extended for interprocedural cases by considering function-related statements. The proposed scheme was implemented in an LLVM compiler and can be applied to OpenCL programs. We analyzed 10 programs with 24 kernels, with a total analyzed program size of 1,306 instructions in an LLVM intermediate representation, with 885 variables, 108 branches, and 313 pointer-related statements. The total number of divergent pointers detected was 146 for the proposed scheme, 200 for the scheme in which the pointer was always divergent, and 155 for the current LLVM default scheme; the total numbers of divergent variables detected were 458, 519, and 482, respectively, with 31, 34, and 32 divergent branches. These experimental results indicate that the proposed scheme is more precise than both a scheme in which a pointer is always divergent and the current LLVM default scheme.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pointer-Based Divergence Analysis for OpenCL 2.0 Programs

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing

Lead the way for us

Journal: ACM Transactions on Parallel Computing	Publication Date: Oct 15, 2021
Citations: 8

Similar Papers

Practical improvements to the construction and destruction of static single assignment form
Preston Briggs ... Timothy J Harvey
Software: Practice and Experience | VOL. 28
Preston Briggs, et. al.Preston Briggs ... Timothy J Harvey
10 Jul 1998
Software: Practice and Experience | VOL. 28

Value based redundancy detection in SSA code
C M Akhila ... Nabizath Saleena
-
C M Akhila, et. al.C M Akhila ... Nabizath Saleena
01 Dec 2016
01 Dec 2016

Improved bitwidth-aware variable packing
V Krishna Nandivada ... Rajkishore Barik
ACM Transactions on Architecture and Code Optimization | VOL. 10
V Krishna Nandivada, et. al.V Krishna Nandivada ... Rajkishore Barik
16 Sep 2013
ACM Transactions on Architecture and Code Optimization | VOL. 10

Improving Performance of GPU Specific OpenCL Program on CPUs
Qiang Lan ... Huayou Su
-
Qiang Lan, et. al.Qiang Lan ... Huayou Su
01 Dec 2012
01 Dec 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pointer-Based Divergence Analysis for OpenCL 2.0 Programs

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing