Abstract

We investigate the effect of hard faults on a massively-parallel implementation of the Sparse Grid Combination Technique (SGCT), an efficient numerical approach for the solution of high-dimensional time-dependent PDEs. The SGCT allows us to increase the spatial resolution of a solver to a level that is out of scope with classical discretization schemes due to the curse of dimensionality. We exploit the inherent data redundancy of this algorithm to obtain a scalable and fault-tolerant implementation without the need of checkpointing or process replication. It is a lossy approach that can guarantee convergence for a large number of faults and a wide range of applications. We present first results using our fault simulation framework – and the first convergence and scalability results with simulated faults and algorithm-based fault tolerance for PDEs in more than three dimensions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.