MPI-CUDA sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner

G Oyarzun,R Borrell,A Gorobets,A Oliva

doi:10.1016/j.compfluid.2013.10.035

Abstract

The preconditioned conjugate gradient (PCG) is one of the most prominent iterative methods for the solution of sparse linear systems with symmetric and positive definite matrix that arise, for example, in the modeling of incompressible flows. The method relies on a set of basic linear algebra operations which determine the overall performance. Therefore, to achieve improvements in the performance, implementations of these basic operations must be adapted to the changes in the architecture of parallel computing systems. In the last years, one of the strategies to increase the computing power of supercomputers has been the usage of Graphics Processing Units (GPUs) as math co-processors in addition to CPUs. This paper presents a MPI-CUDA implementation of the PCG solver for such hybrid computing systems composed of multiple CPUs and GPUs. Special attention has been paid to the sparse matrix–vector multiplication (SpMV), because most of the execution time of the solver is spent on this operation. The approximate inverse preconditioner, which is used to improve the convergence of the CG solver, is also based on the SpMV operation. An overlapping of data transfer and computations is proposed in order to hide the MPI and the CPU-GPU communications needed to perform parallel SpMVs. This strategy has shown a considerable improvement and, as a result, the hybrid implementation of the PCG solver has demonstrated a significant speedup compared to the CPU-only implementation.

Full Text