Algorithm Flattening: Complete branch elimination for GPU requires a paradigm shift from CPU thinking

Lucas Vespa,Alexander Bauman,Jenny Wells

doi:10.1109/hpec.2015.7322477

Abstract

Graphics processing units (GPUs) have inadvertently become supercomputers in and of themselves, to the benefit of applications outside of graphics. Acceleration of multiple orders of magnitude has been achieved in scientific computing, co-processing and the like. However, the Single Instruction Multiple Data (SIMD) design of GPUs is extremely sensitive to thread divergence. So much so that performance improvement from GPUs is all but eviscerated by thread divergence for many applications. This problem has driven general purpose GPU computing in the direction of finding “appropriate” applications to accelerate, rather than accelerating applications with a need for performance improvements. Thread divergence is generally caused by branches. Previous research has addressed the issue of reducing branches, but none of this work aims to entirely eliminate branches, because the methods required for complete branch elimination are a drastic de-optimization for CPU. We present Algorithm Flattening (AF), a de-optimization for CPU which completely removes all branches, and results in a significant optimization for GPU accelerated applications. AF eliminates thread divergence, substantially decreases execution time, allows for the implementation of algorithms on GPU which previously do not fully utilize GPU capability and generates deterministic performance. AF removes branches, replacing them with a reduced equation, and achieves a substantial speedup of already GPU accelerated algorithms and applications. We believe that AF will have a significant impact on high performance computing as it is a long needed solution that allows unprecedented use of GPUs for general purpose applications.

Full Text