Abstract

To accelerate the Krylov subspace-based linear equation solvers on Graphics Processing Units (GPUs), a stable, efficient and highly parallel preconditioner is essential. One of the strong candidates for such a preconditioner is the combination of the block red–black ordering and the relaxed modified incomplete LU factorization without fill-ins (MILU(0)). In this paper, we present techniques for implementing this type of preconditioner on General-purpose computing on GPU (GPGPU) using OpenACC. Our implementation is designed for 3-dimensional finite-difference computations with 7-point stencil, and the matrix storage format is optimized to realize coalesced memory access. Also, mixed-precision computation is employed to exploit the high single-precision performance of GPUs without sacrificing the accuracy of the computed solution. Extensive numerical tests were performed and the optimal values of various tunable parameters such as the number of blocks in each direction and the number of workers specified in OpenACC clauses are discussed. Performance comparison on NVIDIA Quadro GP100 and Tesla K40t GPUs shows that our solver is much faster than existing libraries like cuSPARSE, MAGMA, ViennaCL, and Ginkgo, especially when multiple linear equations with coefficient matrices sharing the same nonzero pattern are solved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call