Predictable Thread Coarsening

Nicolai Stawinoga,Tony Field

doi:10.1145/3194242

Predictable Thread Coarsening

Nicolai Stawinoga, Tony Field

Open Access

https://doi.org/10.1145/3194242

Copy DOI

Journal: ACM Transactions on Architecture and Code Optimization	Publication Date: Jun 12, 2018
Citations: 8

Affiliation: Imperial College London

#Thread Coarsening #NVidia GPU Architectures + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Thread coarsening on GPUs combines the work of several threads into one. We show how thread coarsening can be implemented as a fully automated compile-time optimisation that estimates the optimal coarsening factor based on a low-cost, approximate static analysis of cache line re-use and an occupancy prediction model. We evaluate two coarsening strategies on three different NVidia GPU architectures. For NVidia reduction kernels we achieve a maximum speedup of 5.08x, and for the Rodinia benchmarks we achieve a mean speedup of 1.30x over 8 of 19 kernels that were determined safe to coarsen.

Full Text