Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

David Böhme,Markus Geimer,Felix Voigtlaender,Lukas Arnold,Felix Wolf

doi:10.1145/2934661

Abstract

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira, Jr., et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. By replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances, even for runs with hundreds of thousands of processes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing

Lead the way for us

Journal: ACM Transactions on Parallel Computing	Publication Date: Jul 20, 2016
Citations: 42

Similar Papers

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
David Bohme ... Lukas Arnold
-
David Bohme, et. al.David Bohme ... Lukas Arnold
01 Sep 2010
01 Sep 2010

Scalable Critical-Path Based Performance Analysis
David Bohme ... Bronis R De Supinski
-
David Bohme, et. al.David Bohme ... Bronis R De Supinski
01 May 2012
01 May 2012

Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications
David Bohme ... Felix Wolf
-
David Bohme, et. al.David Bohme ... Felix Wolf
01 May 2012
01 May 2012

Scalable detection of MPI-2 remote memory access inefficiency patterns
Marc-André Hermanns ... Markus Geimer
The International Journal of High Performance Computing Applications | VOL. 26
Marc-André Hermanns, et. al.Marc-André Hermanns ... Markus Geimer
08 Jun 2011
The International Journal of High Performance Computing Applications | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing