Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming

Alexandros V Gerbessiotis,Seung-Yeop Lee

doi:10.1155/2004/934718

Abstract

In this work we make a strong case for remote memory access (RMA) as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI‐2 enabled libraries but also bulk‐synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix‐sort algorithm and examine their performance on a LINUX‐based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message‐passing equivalent code and in the case of radix‐sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.

Highlights

In the past years several parallel computing models have been proposed such as the CGM [7], LogP [6], BSP [24], and QSM [15] for the design of parallel algorithms and the programming of parallel computers
We made a case for remote memory access as the effective way to program a parallel computer by proposing a robust programmatic framework that supports RMA in a library independent, simple, intuitive, and portable way using the C programming language
We examined the performance of these programs on a LINUX-based PC cluster under three different RMA enabled libraries: LAM Message Passing Interface (MPI), BSPlib and PUB-Library, and under LAM MPI for two-sided communication

Summary

Introduction

In the past years several parallel computing models have been proposed such as the CGM [7], LogP [6], BSP [24], and QSM [15] for the design of parallel algorithms and the programming of parallel computers. At the same time a number of parallel libraries have become available that allow portable programming on a variety of parallel hardware platforms Most of the these libraries are totally independent of these programming models; libraries based on the Message Passing Interface (MPI) such as the freely available LAM MPI [17] and MPICH [22], or commerical ones such as WMPI [5], and other libraries such as the Parallel Virtual Machine (PVM) [9] fall into this category all of which offer extensive library features of several hundred function calls. The second algorithm is an enhanced version of the straightforward parallelization where messages are combined before being sent out so that each processor sends and receives one long message We carry out this experimental study for one additional reason: it offers a comparison of the communication performance of the three libraries on a cluster of PC workstations under a realistic set of benchmark programs. MPI in general and LAM MPI in particular needs to be downsized into a smaller set of library calls that are more optimized and fine tuned and give less freedom of choice to the average programmer

A framework for parallel programming

The As suite

The Mult suite

The RDx suite

Conclusion

Speedup Results for LAM MPI

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Jan 1, 2004
Citations: 17	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems
M Krishnan ... J Nieplocha
-
M Krishnan, et. al.M Krishnan ... J Nieplocha
26 Apr 2004
26 Apr 2004

An Evaluation of the One-Sided Performance in Open MPI
Nathan Hjelm
-
Nathan HjelmNathan Hjelm
25 Sep 2016
25 Sep 2016

Scaling Linear Algebra Kernels Using Remote Memory Access
Manojkumar Krishnan ... Robert R Lewis
-
Manojkumar Krishnan, et. al.Manojkumar Krishnan ... Robert R Lewis
01 Sep 2010
01 Sep 2010

Modeling and analysis of remote memory access programming
Andrei Marian Dan ... Patrick Lam
ACM SIGPLAN Notices | VOL. 51
Andrei Marian Dan, et. al.Andrei Marian Dan ... Patrick Lam
19 Oct 2016
ACM SIGPLAN Notices | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Programming