Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution Rates

Adam Hammouda,Andrew R Siegel,Stephen F Siegel

doi:10.1145/2742351

Adam Hammouda, Andrew R Siegel + Show 1 more

https://doi.org/10.1145/2742351

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Next-generation HPC computing platforms are likely to be characterized by significant, unpredictable nonuniformities in execution time among compute nodes and cores. The resulting load imbalances from this nonuniformity are expected to arise from a variety of sources—manufacturing discrepancies, dynamic power management, runtime component failure, OS jitter, software-mediated resiliency, and TLB/- cache performance variations, for example. It is well understood that existing algorithms with frequent points of bulk synchronization will perform relatively poorly in the presence of these sources of process nonuniformity. Thus, recasting classic bulk synchronous algorithms into more asynchronous, coarse-grained parallelism is a critical area of research for next-generation computing. We propose a class of parallel algorithms for explicit stencil computations that can tolerate these nonuniformities by decoupling per process communication and computation in order for each process to progress asynchronously while maintaining solution correctness. These algorithms are benchmarked with a 1D domain decomposed (“slabbed”) implementation of the 2D heat equation as a model problem, and are tested in the presence of simulated nonuniform process execution rates. The resulting performance is compared to a classic bulk synchronous implementation of the model problem. Results show that the runtime of this article’s algorithm on a machine with simulated process nonuniformities is 5--99% slower than the runtime of its classic counterpart on a machine free of nonuniformities. However, when both algorithms are run on a machine with comparable synthetic process nonuniformities, this article’s algorithm is 1--37 times faster than its classic counterpart.

Full Text