Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

P Ghysels,W Vanroose

doi:10.1016/j.parco.2013.06.001

Abstract

Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified preconditioned Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrix-vector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual (CR) method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm by Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard preconditioned CG.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

Abstract

Talk to us

Similar Papers

More From: Parallel Computing

Lead the way for us

Journal: Parallel Computing	Publication Date: Jun 24, 2013
Citations: 173

Similar Papers

Two New Preconditioned Conjugate Gradient Methods for Minimization Problems
Hussein Ageel Khatab ... Salah Gazi Shareef
Mathematics and Statistics | VOL. 11
Hussein Ageel Khatab, et. al.Hussein Ageel Khatab ... Salah Gazi Shareef
01 Jan 2023
Mathematics and Statistics | VOL. 11

On MrR (Mister R) Method for Solving Linear Equations with Symmetric Matrices
Kuniyoshi Abe ... Seiji Fujino
-
Kuniyoshi Abe, et. al.Kuniyoshi Abe ... Seiji Fujino
01 Jan 2017
01 Jan 2017

Parallelization of Finite Element Analysis of Nonlinear Magnetic Fields Using GPU
Takayuki Okimura ... Teruyoshi Sasayama
IEEE Transactions on Magnetics | VOL. 49
Takayuki Okimura, et. al.Takayuki Okimura ... Teruyoshi Sasayama
01 May 2013
IEEE Transactions on Magnetics | VOL. 49

Large-Scale Magnetostatic Domain Decomposition Analysis Based on the Minimal Residual Method
Hiroshi Kanayama ... Seigo Terada
-
Hiroshi Kanayama, et. al.Hiroshi Kanayama ... Seigo Terada
01 Jun 2012
01 Jun 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

Abstract

Talk to us

Similar Papers

More From: Parallel Computing