Exploiting Limited Access Distance for Kernel Fusion Across the Stages of Explicit One-Step Methods on GPUs

Matthias Korch,Tim Werner

doi:10.1109/cahpc.2018.8645892

Abstract

The performance of explicit parallel methods solving large systems of ordinary differential equations (ODEs) on GPUs is often memory bound. Therefore, locality optimizations, such as kernel fusion, are desirable. This paper exploits a special property of a large class of right-hand-side (RHS) functions to enable the fusion of computations of blocks of components across multiple stages of the method. This leads to a tiling of the stages within one time step. Our approach is based on a representation of the ODE method by a data flow graph and allows efficient GPU code with fused kernels to be generated automatically for user-defined tilings. In particular, we investigate two generalized tiling strategies, trapezoidal and hexagonal tiling, which are evaluated experimentally for several different high-order Runge-Kutta (RK) methods.

Full Text