Message-passing Implementation Research Articles

Since its release, the Java programming language has attracted considerable attention from the high-performance computing (HPC) community because of its portability, high programming productivity, and built-in multithreading and networking support. As a consequence, several initiatives have been taken to develop a high-performance Java message-passing library to program distributed memory architectures, such as clusters. The performance of Java message-passing applications relies heavily on the communications performance. Thus, the design and implementation of low-level communication devices that support message-passing libraries is an important research issue in Java for HPC. MPJ Express is our Java message-passing implementation for developing high-performance parallel Java applications. Its public release currently contains three communication devices: the first one is built using the Java New Input/Output (NIO) package for the TCP/IP; the second one is specifically designed for the Myrinet Express library on Myrinet; and the third one supports thread-based shared memory communications. Although these devices have been successfully deployed in many production environments, previous performance evaluations of MPJ Express suggest that the buffering layer, tightly coupled with these devices, incurs a certain degree of copying overhead, which represents one of the main performance penalties. This paper presents a more efficient Java message-passing communications device, based on Java Input/Output sockets, that avoids this buffering overhead. Moreover, this device implements several strategies, both in the communication protocol and in the HPC hardware support, which optimizes Java message-passing communications. In order to evaluate its benefits, this paper analyzes the performance of this device comparatively with other Java and native message-passing libraries on various high-speed networks, such as Gigabit Ethernet, Scalable Coherent Interface, Myrinet, and InfiniBand, as well as on a shared memory multicore scenario. The reported communication overhead reduction encourages the upcoming incorporation of this device in MPJ Express (). Copyright © 2011 John Wiley & Sons, Ltd.

Read full abstract

With the emergence of commodity multicore architectures, exploiting tightly-coupled parallelism has become increasingly important. Functional programming languages, such as Haskell, are, in principle, well placed to take advantage of this trend, offering the ability to easily identify large amounts of fine-grained parallelism. Unfortunately, obtaining real performance benefits has often proved hard to realise in practice. This paper reports on a new approach using middleware that has been constructed using the Eden parallel dialect of Haskell. Our approach is "low pain" in the sense that the programmer constructs a parallel program by inserting a small number of higher-order algorithmic skeletons at key points in the program. It is "high gain" in the sense that we are able to get good parallel speedups. Our approach is unusual in that we do not attempt to use shared memory directly, but rather coordinate parallel computations using a message-passing implementation. This approach has a number of advantages. Firstly, coordination, i.e. locking and communication, is both confined to limited shared memory areas, essentially the communication buffers, and is also isolated within well-understood libraries. Secondly, the coarse thread granularity that we obtain reduces coordination overheads, so locks are normally needed only on (relatively large) messages, and not on individual data items, as is often the case for simple shared-memory implementations. Finally, cache coherency requirements are reduced since individual tasks do not share caches, and can garbage collect independently. We report results for two representative computational algebra problems. Computational algebra is a challenging application area that has not been widely studied in the general parallelism community. Computational algebra applications have high computational demands, and are, in principle, often suitable for parallel execution, but usually display a high degree of irregularity in terms of both task and data structure. This makes it difficult to construct parallel applications that perform well in practice. Using our system, we are able to obtain both extremely good processor utilisation (97%) and very good absolute speedups (up to 7.7) on an eight-core machine.

Read full abstract

Message-passing Implementation Research Articles

Related Topics

Articles published on Message-passing Implementation

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Reproducibility strategies for parallel Preconditioned Conjugate Gradient

Communication-efficient randomized consensus

Distributed Reservoir Computing with Sparse Readouts [Research Frontier

Multilevel Balancing Domain Decomposition at Extreme Scales

Exploiting task and data parallelism in ILUPACK’s preconditioned CG solver on NUMA architectures and many-core accelerators

Automatic tuning of sparse matrix-vector multiplication on multicore clusters

High-performance computing selection of models of DNA substitution for multicore clusters

Deterministic Message Passing for Distributed Parallel Computing

Design of Scalable Java Communication Middleware for Multi-Core Systems

Device level communication libraries for high-performance computing in Java

A Heuristic Approach for the Automatic Insertion of Checkpoints in Message-Passing Codes

Low-pain, high-gain multicore programming in Haskell

Using bisimulation proof techniques for the analysis of distributed abstract machines

Array files for computational chemistry: MP2 energies

Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS‐MP

Parallel Sn Sweeps on Unstructured Grids: Algorithms for Prioritization, Grid Partitioning, and Cycle Detection

Performance of a new CFD flow solver using a hybrid programming paradigm

A weakest failure detector-based asynchronous consensus protocol for f< n

Design and Implementation of Parallel Dynamic Shortest Path Algorithms for Intelligent Transportation Systems Applications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Message-passing Implementation Research Articles

Related Topics

Articles published on Message-passing Implementation

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Reproducibility strategies for parallel Preconditioned Conjugate Gradient

Communication-efficient randomized consensus

Distributed Reservoir Computing with Sparse Readouts [Research Frontier

Multilevel Balancing Domain Decomposition at Extreme Scales

Exploiting task and data parallelism in ILUPACK’s preconditioned CG solver on NUMA architectures and many-core accelerators

Automatic tuning of sparse matrix-vector multiplication on multicore clusters

High-performance computing selection of models of DNA substitution for multicore clusters

Deterministic Message Passing for Distributed Parallel Computing

Design of Scalable Java Communication Middleware for Multi-Core Systems

Device level communication libraries for high-performance computing in Java

A Heuristic Approach for the Automatic Insertion of Checkpoints in Message-Passing Codes

Low-pain, high-gain multicore programming in Haskell

Using bisimulation proof techniques for the analysis of distributed abstract machines

Array files for computational chemistry: MP2 energies

Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS‐MP

Parallel Sn Sweeps on Unstructured Grids: Algorithms for Prioritization, Grid Partitioning, and Cycle Detection

Performance of a new CFD flow solver using a hybrid programming paradigm

A weakest failure detector-based asynchronous consensus protocol for f< n

Design and Implementation of Parallel Dynamic Shortest Path Algorithms for Intelligent Transportation Systems Applications