Abstract

The parallel strong-scaling of iterative methods is often determined by the number of global reductions at each iteration. Low-synch Gram–Schmidt algorithms are applied here to the Arnoldi algorithm to reduce the number of global reductions and therefore to improve the parallel strong-scaling of iterative solvers for nonsymmetric matrices such as the GMRES and the Krylov–Schur iterative methods. In the Arnoldi context, the QR factorization is “left-looking” and processes one column at a time. Among the methods for generating an orthogonal basis for the Arnoldi algorithm, the classical Gram–Schmidt algorithm, with reorthogonalization (CGS2) requires three global reductions per iteration. A new variant of CGS2 that requires only one reduction per iteration is presented and applied to the Arnoldi algorithm. Delayed CGS2 (DCGS2) employs the minimum number of global reductions per iteration (one) for a one-column at-a-time algorithm. The main idea behind the new algorithm is to group global reductions by rearranging the order of operations. DCGS2 must be carefully integrated into an Arnoldi expansion or a GMRES solver. Numerical stability experiments assess robustness for Krylov–Schur eigenvalue computations. Performance experiments on the ORNL Summit supercomputer then establish the superiority of DCGS2 over CGS2.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call