Stencil computation is an important kind of computational mode that widely used in numeric. It iteratively updates the values of the spatial grid points over multiple time steps according to a given pattern. Existing techniques suffer from high complexity of massive iterative computations. Convolution and fast Fourier transform (FFT) provide the possibility to avoid massive iterations and reduce time complexity of stencil computation. However, current convolution-based fast stencil algorithms cannot effectively solve the problems with aperiodic boundary conditions that are common in practical applications. In this paper, we present a novel algorithm, TurboStencil, for linear stencil computations with aperiodic boundary condition. TurboStencil provides a padding method, eliminating the effects of boundary conditions, to enable convolution for all grid points. For symmetric stencil, TurboStencil only computes once by applying FFT, and thus achieves the time complexity of O(NlogN), where N is the grid data size. For asymmetric stencil, TurboStencil also only computes several times by employing a divide-and-conquer method and FFT, and exhibits a lower complexity than existing stencil algorithms. Experimental results demonstrate that TurboStencil outperforms the state-of-the-art convolution-based fast stencil algorithm by up to 777.1×, 43.2×, and 9.4× for symmetric stencil, and 12.9×, 2.0×, and 1.3× for asymmetric stencil, respectively, on 1D, 2D, and 3D benchmarks.
Read full abstract