Abstract

The performance of explicit parallel methods solving large systems of ordinary differential equations (ODEs) on GPUs is often memory bound. Therefore, locality optimizations, such as kernel fusion, are desirable. This paper exploits a special property of a large class of right-hand-side (RHS) functions to enable the fusion of computations of blocks of components across multiple stages of the method. This leads to a tiling of the stages within one time step. Our approach is based on a representation of the ODE method by a data flow graph and allows efficient GPU code with fused kernels to be generated automatically for user-defined tilings. In particular, we investigate two generalized tiling strategies, trapezoidal and hexagonal tiling, which are evaluated experimentally for several different high-order Runge-Kutta (RK) methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call