MPI Communication Research Articles

Mesoscopic numerical simulations provide a unique approach for the quantification of the chemical influences on red blood cell functionalities. The transport Dissipative Particle Dynamics (tDPD) method can lead to such effective multiscale simulations due to its ability to simultaneously capture mesoscopic advection, diffusion, and reaction. In this paper, we present a GPU-accelerated red blood cell simulation package based on a tDPD adaptation of our red blood cell model, which can correctly recover the cell membrane viscosity, elasticity, bending stiffness, and cross-membrane chemical transport. The package essentially processes all computational workloads in parallel by GPU, and it incorporates multi-stream scheduling and non-blocking MPI communications to improve inter-node scalability. Our code is validated for accuracy and compared against the CPU counterpart for speed. Strong scaling and weak scaling are also presented to characterize scalability. We observe a speedup of 10.1 on one GPU over all 16 cores within a single node, and a weak scaling efficiency of 91% across 256 nodes. The program enables quick-turnaround and high-throughput numerical simulations for investigating chemical-driven red blood cell phenomena and disorders. Program summaryProgram Title: USERMESO 2.0Program Files doi:http://dx.doi.org/10.17632/89849t3ngk.1Licensing provisions: GNU General Public License, Version 3Programming language: C/C++, CUDA C/C++, MPI.Nature of problem: Particle-based simulation of a red blood cell suspension with chemical transport property.Solution method: Each red blood cell is represented by a 3-D triangular mesh with bonded potential under area and volume constraints. The solvent is approximated with coarse-grained particles. The time evolution of the system is integrated using Velocity-Verlet algorithm.Restrictions: The code is compatible with NVIDIA GPGPUs with compute capability 3.0 and above.Unusual features: The code is implemented on GPGPUs with significantly improved speed.Additional Comments: Github repository link https://github.com/AnselGitAccount/USERMESO-2.0

Read full abstract

ion for MPI programs with pointer-based data structures. The main features of DRASync are: it amortizes communication among MPI processes to allow efficient parallel allocation in a global address space; it takes advantage of bulk deallocation and good locality with pointer-based data structures. Finally, DRASync supports ownership semantics of regions by MPI processes akin to reader–writer locks, which makes for a high-level, intuitive synchronization tool in MPI programs, without sacrificing message-passing performance. In ‘‘An Evaluation of MPI Message Rate on Hybrid-Core Processors’’, Barrett et al. analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed. The authors compare throughput-oriented cores to University Carlos III of Madrid, Spain single-thread optimized cores in terms of the ability to perform MPI match processing. The intent of this study is to gain insight into the ability of throughputoriented cores to adequately perform MPI matching and to better understand how MPI implementations on future hybrid-core processors should allocate computing resources to try to optimize performance-critical MPI operations. Collective MPI communications have to be executed in the same order by all processes in their communicator and the same number of times, otherwise it is not conforming to the standard and a deadlock can occur. As soon as the control-flow involving these collective operations becomes more complex, in particular including conditionals on process ranks, ensuring the correction of such code is error-prone. The paper ‘‘PARCOACH: Combining Static and Dynamic Validation of MPI Collective Communications’’ by Saillard et al. proposes a static analysis to detect when such situation occurs, combined with a code transformation that prevents deadlocking. They show with several benchmarks the small impact on performance and the ease of integration of their techniques in the development process. In ‘‘Extreme-scale Computing Services Over MPI: Experiences, Observations and Features Proposal for Next Generation Message Passing Interface’’ by Zounmevo et al., the authors present their experiences in using MPI as a network transport for a large-scale distributed storage system. The authors discuss the features of MPI that facilitate adoption as well as aspects which require various workarounds. Based on use cases, the authors derive a wish-list for both MPI implementations and the MPI forum to facilitate the adoption of MPI large-scale persistent services. Finally in the paper ‘‘Optimization of MPI collective operations on the IBM Blue Gene/Q Supercomputer’’, Kumar et al. present scalable algorithms to optimize MPI collective operations by taking advantage of the various features of the Blue Gene/Q torus and collective networks. The authors accelerate summing of network packets with local buffers by the use of the Quad Processing SIMD unit in the Blue Gene/Q cores and executing the sums on multiple communication threads supported by the optimized communication libraries.

Read full abstract

MPI Communication Research Articles

Related Topics

Articles published on MPI Communication

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Scalable communication event tracing via clustering

GPU-accelerated red blood cells simulations with transport dissipative particle dynamics

Leveraging the accelerated processing units for seismic imaging: A performance and power efficiency comparison against CPUs and GPUs

Modelling parallel overhead from simple run-time records

LU factorization on heterogeneous systems: an energy-efficient approach towards high performance

A study of the influence of VM allocation policies on MPI Bcast and MPI Exchange latency in cloud

Simulator test suite for evaluating performance of multithreaded Message passing interface execution on SUN cluster

High performance Python for direct numerical simulations of turbulent flows

A Hybrid Parallel Delaunay Image-to-mesh Conversion Algorithm Scalable on Distributed-memory Clusters

623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores

Real-time and real-space program tuned in K-computer

Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E

Hybrid parallelization of the LIGGGHTS open-source DEM code

τ-Lop: Modeling performance of shared memory MPI

The Particle Accelerator Simulation Code PyORBIT

Recent advances in the Message Passing Interface

PARCOACH: Combining static and dynamic validation of MPI collective communications

On maximum achievable speeds for field solvers

An Extensible System for Multilevel Automatic Data Partition and Mapping

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

MPI Communication Research Articles

Related Topics

Articles published on MPI Communication

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Scalable communication event tracing via clustering

GPU-accelerated red blood cells simulations with transport dissipative particle dynamics

Leveraging the accelerated processing units for seismic imaging: A performance and power efficiency comparison against CPUs and GPUs

Modelling parallel overhead from simple run-time records

LU factorization on heterogeneous systems: an energy-efficient approach towards high performance

A study of the influence of VM allocation policies on MPI Bcast and MPI Exchange latency in cloud

Simulator test suite for evaluating performance of multithreaded Message passing interface execution on SUN cluster

High performance Python for direct numerical simulations of turbulent flows

A Hybrid Parallel Delaunay Image-to-mesh Conversion Algorithm Scalable on Distributed-memory Clusters

623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores

Real-time and real-space program tuned in K-computer

Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E

Hybrid parallelization of the LIGGGHTS open-source DEM code

τ-Lop: Modeling performance of shared memory MPI

The Particle Accelerator Simulation Code PyORBIT

Recent advances in the Message Passing Interface

PARCOACH: Combining static and dynamic validation of MPI collective communications

On maximum achievable speeds for field solvers

An Extensible System for Multilevel Automatic Data Partition and Mapping