Comparison of Message-Passing and Shared Memory Implementations of the GMRES Method on MIMD Computers

Joanna Płażek,Krzysztof Banaś,Jacek Kitowski

doi:10.1155/2001/681621

Abstract

In this paper we compare different parallel implementations of the same algorithm for solving nonlinear simulation problems on unstructured meshes. In the first implementation, making use of the message-passing programming model and the PVM system, the domain decomposition of unstructured mesh is implemented, while the second implementation takes advantage of the inherent parallelism of the algorithm by adopting the shared-memory programming model. Both implementations are applied to the preconditioned GMRES method that solves iteratively the system of linear equations. A combined approach, the hybrid programming model suitable for multicomputers with SMP nodes, is introduced. For performance measurements we use compressible fluid flow simulation in which sequences of finite element solutions form time approximations to the Euler equations. The tests are performed on HP SPP1600, HP S2000 and SGI Origin2000 multiprocessors and report wall-clock execution time and speedup for different number of processing nodes and for different meshes. Experimentally, the explicit programming model proves to be more efficient than the implicit model by 20—70%, depends on the mesh and the machine.

Highlights

Architecture details of parallel computers make it possible to define diversity of machine taxonomies, one of the most important factors is organization of the address space
In this paper we compare different parallel implementations of the same algorithm for solving nonlinear simulation problems on unstructured meshes using the adaptive finite element approach. For this purpose we developed a parallel adaptive finite element algorithm in the version designed to solve the compressible Euler equations of inviscid fluid flow
We tested our parallel versions of GMRES with an example of flow simulations – a well known transient benchmark problem – the ramp problem [54]

Summary

Introduction

Architecture details of parallel computers make it possible to define diversity of machine taxonomies (see for example [10,28,51]), one of the most important factors is organization of the address space. I.e., the shared address space organization and the distributed memory architecture, there is a wide class of machines with virtually shared, physically distributed memory organization (often called Distributed Shared Memory, DSM, machines). DSM can be software or hardware implemented; the latter one offers better characteristics and several typical classes, like cc-NUMA (cache coherent non-uniform memory access), COMA (coherent only memory architecture) and RMS (reflective memory systems). At present cc-NUMA implementations are commercially the most popular. According to Flynn’s taxonomy [17] they belong to Multiple Instruction/Multiple Data (MIMD) computers. The similar approach is implemented in IBM RS/6000 SP computers with SMP nodes

Objectives

Results

Conclusion