A Comparison of Preconditioning Techniques for Parallelized PCG Solvers for the CCFD Problem

R L Naff ,John D Wilson

doi:10.4122/1.1000000386

Abstract

Parallel algorithms for solving sparse symmetric matrix systems that might result from the cell-centered finite difference (CCFD) scheme are compared. Parallelization is based in partitioning the mass matrix such that each partition is controlled by a separate process. These processes may then be distributed among a networked cluster of processors using the standard Message-Passing Interface (MPI). MPI software allows for multiple, simultaneous processes to coordinate and exchange information for some central purpose. In this study, partitioning of the mass matrix is based in decomposing the CCFD domain into non-overlapping subdomains; each subdomain corresponds to a partition of the mass matrix. The subdomain partitions are numbered alternately, using a red/black numbering scheme. Partitions are linked by the CCFD coefficients corresponding to cell edges that coincide with subdomain boundaries internal to the domain. A major portion of this work examines the best way to handle connectivity information between partitions. Algorithms considered in this study are based in the preconditioned conjugate gradient scheme (PCG) and differ only in the preconditioning used. Parallelization of the PCG solver entails running essentially identical conjugate-gradient loops on separate processes for every subdomain partition. These loops must exchange information globally to calculate inner products and locally for sharing connectivity information between partitions. The classic incomplete Cholesky preconditioner with zero fill (IC(0)) is used as the standard for comparison. Another preconditioning scheme considered is based principally in an approximate block Gaussian (BG) iterative solution to the problem. In both these preconditioners, arrays containing connectivity information are passed between processes corresponding to adjacent partitions. Because of this need to pass connectivity information, the incomplete Cholesky preconditioner is limited, in practice, to a zero fill application. This limitation can be partially alleviated by using a BG iteration as a preconditioner, which requires approximate solves of individual matrix partitions. These approximate solves can be carried out using any number of preconditioners, including IC(0); however, BG iteration involves an additional level of approximation, giving the result that BG iteration with approximate IC(0) solves is less efficient than using simple IC(0) as the preconditioner in the parallel PCG algorithm. Preconditioners based in the Jacobi scheme are also considered as they require no connectivity information. These preconditioners represent a trade off between lower communication costs at the expense of increased work to obtain convergence. We are presently exploring variants of these methods for use as preconditioners in the parallelized conjugate gradient scheme; we expect to report on the results of this work.

Full Text