Algorithm-based fault recovery of adaptively refined parallel multilevel grids

Linda Stals

doi:10.1177/1094342017720801

Abstract

On future extreme scale computers, it is expected that faults will become an increasingly serious problem as the number of individual components grows and failures become more frequent. This is driving the interest in designing algorithms with built-in fault tolerance that can continue to operate and that can replace data even if part of the computation is lost in a failure. For fault-free computations, the use of adaptive refinement techniques in combination with finite element methods is well established. Furthermore, iterative solution techniques that incorporate information about the grid structure, such as the parallel geometric multigrid method, have been shown to be an efficient approach to solving various types of partial different equations. In this article, we present an advanced parallel adaptive multigrid method that uses dynamic data structures to store a nested sequence of meshes and the iteratively evolving solution. After a fail-stop fault, the data residing on the faulty processor will be lost. However, with suitably designed data structures, the neighbouring processors contain enough information so that a consistent mesh can be reconstructed in the faulty domain with the goal of resuming the computation without having to restart from scratch. This recovery is based on a set of carefully designed distributed algorithms that build on the existing parallel adaptive refinement routines, but which must be carefully augmented and extended.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Algorithm-based fault recovery of adaptively refined parallel multilevel grids

Abstract

Talk to us

Similar Papers

More From: The International Journal of High Performance Computing Applications

Lead the way for us

Journal: The International Journal of High Performance Computing Applications	Publication Date: Aug 23, 2017
Citations: 4

Similar Papers

Optimization of serial and parallel communications for parallel geometric multigrid method
Kengo Nakajima
-
Kengo NakajimaKengo Nakajima
01 Dec 2014
01 Dec 2014

A Parallel Auxiliary Grid Algebraic Multigrid Method for Graphic Processing Units
Lu Wang ... Jinchao Xu
SIAM Journal on Scientific Computing | VOL. 35
Lu Wang, et. al.Lu Wang ... Jinchao Xu
01 Jan 2013
SIAM Journal on Scientific Computing | VOL. 35

A parallel scalable multigrid method and HOC scheme for anisotropy elliptic problems
Zhao-Hui Li ... Wen-Quan Tao
Numerical Heat Transfer, Part B: Fundamentals | VOL. 71
Zhao-Hui Li, et. al.Zhao-Hui Li ... Wen-Quan Tao
03 Apr 2017
Numerical Heat Transfer, Part B: Fundamentals | VOL. 71

Parallel adaptive multigrid methods in plane linear elasticity problems
Peter Bastian ... Knut Eckstein
Numerical Linear Algebra with Applications | VOL. 4
Peter Bastian, et. al.Peter Bastian ... Knut Eckstein
01 May 1997
Numerical Linear Algebra with Applications | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Algorithm-based fault recovery of adaptively refined parallel multilevel grids

Abstract

Talk to us

Similar Papers

More From: The International Journal of High Performance Computing Applications