The recent development of multicore technologies on modern desktop computers makes parallelization of the proposed numerical approaches a priority in algorithmic research. The main performance improvement of personal computers in the upcoming years will be made based on the increasing number of cores on modern CPUs. This shifts the focus of algorithmic research from the development of sequential numerical methods to parallel methodology. This paper presents an efficient parallel direct algorithm with near-optimal complexity for the compact fourth and sixth-order approximation of the three-dimensional Helmholtz equations (Turkel et al., 2013) with the problem coefficient depending on only one of the coordinate directions. The developed method is based on a combination of the separation of variables technique and a Fast Fourier Transform (FFT) type method. Similar direct solvers for the lower-order approximations of the two and three-dimensional Helmholtz equation were considered in several previous publications by the authors and other researchers (see, e.g. Gryazin et al. (2000); Gryazin (2014); Elman and O’Leary (1998); Elman and O’Leary (1999); Toivanen and Wolfmayr (2020)). The authors also consider a generalization of the presented algorithm to the solution of a wide class of linear systems obtained from approximation on the compact 27-point three-dimensional stencils on the rectangular grids with similar requirements on the stencil coefficients. The general restrictions on the coefficients in the considered class of compact schemes are developed and presented. This class includes the second, fourth and sixth-order compact approximation schemes for the three-dimensional Helmholtz equation considered in this paper and our previous publications (Gryazin et al., 2000; Gryazin, 2014; Gryazin, 2014). As an example of the diversity of applications of the developed general method, the direct parallel implementation of a compact fourth-order approximation scheme for a convection–diffusion equation is considered. Another goal of this paper is to investigate the scalability of the proposed technique in the case of a large linear system using different parallel programming extensions. The results of the implementation of this method in OpenMP, MPI and hybrid programming environments on the multicore computers and multiple node clusters are presented and discussed. The results demonstrate the high efficiency of the proposed direct solvers for many important applications on the structured grid with the corresponding 27-diagonal matrices of sizes up to 1011 by 1011.
Read full abstract