MPI Runtime Error Detection with MUST: Advances in Deadlock Detection

Tobias Hilbrich,Martin Schulz,Bronis R De Supinski,Matthias S Müller,Joachim Protze

doi:10.1155/2013/314971

Abstract

The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require 𝒪(p) analysis time per MPI operation, forpprocesses. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.

Highlights

The Message Passing Interface (MPI) [10] is a de facto standard for parallel programming
This paper presents MUST (Marmot Umpire Scalable Tool, named after its predecessors), a runtime tool that overcomes the shortfalls of current tools by providing a scalable solution for efficient runtime MPI error checking
We present MUST, a novel runtime error detection tool for MPI applications

Summary

Introduction

The Message Passing Interface (MPI) [10] is a de facto standard for parallel programming. Other errors, such as messaging deadlocks or type mismatches in messages, require information about more than one process and, need a non-local approach These runtime tools must communicate information from the application processes to a process or thread that runs non-local correctness checks, which complicates their design and scalability. Since process 1 cannot issue the Barrier until process 2 sends the message, both processes block indefinitely These wildcard receives, as well as other MPI constructs, can lead to interleaving dependent MPI deadlocks, which only occur in some application runs.

Related work

Runtime deadlock detection with MUST

Transformation

Example

Deadlock criterion

Optimized deadlock analysis

Runtime detection costs

Delayed WFG construction

Wildcard receive handling

Probing in MUST

Deciding in MUST

Application results

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Jan 1, 2013
Citations: 14	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

MPI Runtime Error Detection with MUST: Advances in Deadlock Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

MPI runtime error detection with MUST: Advances in deadlock detection
Tobias Hilbrich ... Bronis R De Supinski
-
Tobias Hilbrich, et. al.Tobias Hilbrich ... Bronis R De Supinski
01 Nov 2012
01 Nov 2012

MPI runtime error detection with MUST: advances in deadlock detection
...
-
, et. al. ...
10 Nov 2012
10 Nov 2012

Improved MPI Multi-Threaded Performance using OFI Scalable Endpoints
Aravind Gopalakrishnan ... James P Erwin
-
Aravind Gopalakrishnan, et. al.Aravind Gopalakrishnan ... James P Erwin
01 Aug 2019
01 Aug 2019

Design of a portable implementation of partitioned point‐to‐point communication primitives
W Pepper Marts ... Sheikh Ghafoor
Concurrency and Computation: Practice and Experience | VOL. 35
W Pepper Marts, et. al.W Pepper Marts ... Sheikh Ghafoor
22 Feb 2023
Concurrency and Computation: Practice and Experience | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MPI Runtime Error Detection with MUST: Advances in Deadlock Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming