In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs in a space–time domain. For such a numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge–Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either nonlocal data-independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge–Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual “creative programming” approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the basic pipelined algorithm and close to that for the explicit DRP algorithm. Use of the algorithm is demonstrated and comparisons with other schemes are given.
Read full abstract