Recent advances in the Message Passing Interface

Javier García Blas ,Jesús Carretero

doi:10.1177/1094342014549273

Abstract

ion for MPI programs with pointer-based data structures. The main features of DRASync are: it amortizes communication among MPI processes to allow efficient parallel allocation in a global address space; it takes advantage of bulk deallocation and good locality with pointer-based data structures. Finally, DRASync supports ownership semantics of regions by MPI processes akin to reader–writer locks, which makes for a high-level, intuitive synchronization tool in MPI programs, without sacrificing message-passing performance. In ‘‘An Evaluation of MPI Message Rate on Hybrid-Core Processors’’, Barrett et al. analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed. The authors compare throughput-oriented cores to University Carlos III of Madrid, Spain single-thread optimized cores in terms of the ability to perform MPI match processing. The intent of this study is to gain insight into the ability of throughputoriented cores to adequately perform MPI matching and to better understand how MPI implementations on future hybrid-core processors should allocate computing resources to try to optimize performance-critical MPI operations. Collective MPI communications have to be executed in the same order by all processes in their communicator and the same number of times, otherwise it is not conforming to the standard and a deadlock can occur. As soon as the control-flow involving these collective operations becomes more complex, in particular including conditionals on process ranks, ensuring the correction of such code is error-prone. The paper ‘‘PARCOACH: Combining Static and Dynamic Validation of MPI Collective Communications’’ by Saillard et al. proposes a static analysis to detect when such situation occurs, combined with a code transformation that prevents deadlocking. They show with several benchmarks the small impact on performance and the ease of integration of their techniques in the development process. In ‘‘Extreme-scale Computing Services Over MPI: Experiences, Observations and Features Proposal for Next Generation Message Passing Interface’’ by Zounmevo et al., the authors present their experiences in using MPI as a network transport for a large-scale distributed storage system. The authors discuss the features of MPI that facilitate adoption as well as aspects which require various workarounds. Based on use cases, the authors derive a wish-list for both MPI implementations and the MPI forum to facilitate the adoption of MPI large-scale persistent services. Finally in the paper ‘‘Optimization of MPI collective operations on the IBM Blue Gene/Q Supercomputer’’, Kumar et al. present scalable algorithms to optimize MPI collective operations by taking advantage of the various features of the Blue Gene/Q torus and collective networks. The authors accelerate summing of network packets with local buffers by the use of the Quad Processing SIMD unit in the Blue Gene/Q cores and executing the sums on multiple communication threads supported by the optimized communication libraries.

Full Text