Abstract

In this work we make a strong case for remote memory access (RMA) as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI‐2 enabled libraries but also bulk‐synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix‐sort algorithm and examine their performance on a LINUX‐based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message‐passing equivalent code and in the case of radix‐sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.

Highlights

  • In the past years several parallel computing models have been proposed such as the CGM [7], LogP [6], BSP [24], and QSM [15] for the design of parallel algorithms and the programming of parallel computers

  • We made a case for remote memory access as the effective way to program a parallel computer by proposing a robust programmatic framework that supports RMA in a library independent, simple, intuitive, and portable way using the C programming language

  • We examined the performance of these programs on a LINUX-based PC cluster under three different RMA enabled libraries: LAM Message Passing Interface (MPI), BSPlib and PUB-Library, and under LAM MPI for two-sided communication

Read more

Summary

Introduction

In the past years several parallel computing models have been proposed such as the CGM [7], LogP [6], BSP [24], and QSM [15] for the design of parallel algorithms and the programming of parallel computers. At the same time a number of parallel libraries have become available that allow portable programming on a variety of parallel hardware platforms Most of the these libraries are totally independent of these programming models; libraries based on the Message Passing Interface (MPI) such as the freely available LAM MPI [17] and MPICH [22], or commerical ones such as WMPI [5], and other libraries such as the Parallel Virtual Machine (PVM) [9] fall into this category all of which offer extensive library features of several hundred function calls. The second algorithm is an enhanced version of the straightforward parallelization where messages are combined before being sent out so that each processor sends and receives one long message We carry out this experimental study for one additional reason: it offers a comparison of the communication performance of the three libraries on a cluster of PC workstations under a realistic set of benchmark programs. MPI in general and LAM MPI in particular needs to be downsized into a smaller set of library calls that are more optimized and fine tuned and give less freedom of choice to the average programmer

A framework for parallel programming
The As suite
The Mult suite
The RDx suite
Conclusion
Speedup Results for LAM MPI

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.