Abstract

High-end computing is universally recognized to be a strategic tool for leadership in science and technology. A significant portion of high-end computing is conducted on clusters running the message passing interface (MPI) library. MPI has become a de facto standard in HPC. MPI programs, as well as MPI library implementations can be buggy, especially when aiming high performance, and running on or porting onto new platforms. Our recent work has addressed the following areas: A TLA+ formal semantics of a large subset of MPI-1; A Microsoft Phoenix based model extraction and analysis framework for MPI programs; integration into the visual studio environment for error-trace visualization; A new dynamic partial order reduction algorithm (DPOR) tailored to MPI so that the number of interleavings examined during MPI program verification are dramatically reduced; A program called 'inspector' for analyzing C++ programs that has found bugs in publicly distributed threaded programs (Inspector automatically instruments Pthread programs and searches for races based on a new DPOR); verified byte-range locking protocols using MPI one-sided communication - a case study where we found bugs in published byte-range locking protocols, and designed and verified improved versions of these protocols; A new in-situ model checker for MPI programs, that traps MPI calls using its profiling interface (PMPI) and orchestrates control to maximize coverage with minimal state saving overhead. The progress made in exploring these directions, our publications, and associated software tools are described, as are our future plans.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call