Efficient and Portable Workgroup Size Tuning

Chia-Lin Yu,Shiao-Li Tsao

doi:10.1109/tpds.2019.2937295

Abstract

The performance of an OpenCL program is strongly influenced by both hardware and software attributes. To achieve superior performance, developers may leverage automatic performance tuning techniques to determine the optimal parameters on the target device. Although existing approaches have shown promising tuning results in their target scenarios, other requirements such as efficiency, portability, and usability should also be considered because of the rapid growth of heterogeneous computing applications and platforms. In this paper, we re-examine the workgroup size tuning problem and propose a novel approach to meet the aforementioned requirements. We abstract the architectural details into a set of hardware parameters so that the proposed approach can be applied without the presence of target devices, which makes it more accessible to developers. The proposed approach is evaluated on 20 OpenCL kernels and six devices, including both CPUs and GPUs. Experimental results demonstrate that, with negligible overhead, our approach filters out 88.6 percent of the possible workgroup sizes on average. Among all the workgroup size candidates, the best- and worst-performing candidates can achieve average performance of 95.5 and 92.1 percent, respectively, compared with the optimal workgroup size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient and Portable Workgroup Size Tuning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Feb 1, 2020
Citations: 36

Similar Papers

An Auto-Tuner for OpenCL Work-Group Size on GPUs
Thanh Tuan Dao ... Jaejin Lee
IEEE Transactions on Parallel and Distributed Systems | VOL. 29
Thanh Tuan Dao, et. al.Thanh Tuan Dao ... Jaejin Lee
01 Feb 2018
IEEE Transactions on Parallel and Distributed Systems | VOL. 29

Automatic OpenCL work-group size selection for multicore CPUs
...
-
, et. al. ...
07 Oct 2013
07 Oct 2013

Exposing ILP in custom hardware with a dataflow compiler IR
Sangmin Seo ... Gangwon Jo
-
Sangmin Seo, et. al. Sangmin Seo ... Gangwon Jo
01 Oct 2013
01 Oct 2013

Analyzing and improving performance portability of OpenCL applications via auto-tuning
James Price ... Simon Mcintosh-Smith
-
James Price, et. al.James Price ... Simon Mcintosh-Smith
16 May 2017
16 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient and Portable Workgroup Size Tuning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems