A technique to automatically determine Ad-hoc communication patterns at runtime

Ana Moreton-Fernandez,Diego R Llanos,Arturo Gonzalez-Escribano

doi:10.1016/j.parco.2017.08.009

Ana Moreton-Fernandez, Diego R Llanos + Show 1 more

Open Access

PDF Available

https://doi.org/10.1016/j.parco.2017.08.009

Copy DOI

Export

Save

Cite

Journal: Parallel Computing	Publication Date: Sep 1, 2017
Citations: 1	License type: cc-by

Affiliation: University of Valladolid

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Abstract Current High Performance Computing (HPC) systems are typically built as interconnected clusters of shared-memory multicore computers. Several techniques to automatically generate parallel programs from high-level parallel languages or sequential codes have been proposed. To properly exploit the scalability of HPC clusters, these techniques should take into account the combination of data communication across distributed memory, and the exploitation of shared-memory models. In this paper, we present a new communication calculation technique to be applied across different SPMD (Single Program Multiple Data) code blocks, containing several uniform data access expressions. We have implemented this technique in Trasgo, a programming model and compilation framework that transforms parallel programs from a high-level parallel specification that deals with parallelism in a unified, abstract, and portable way. The proposed technique computes at runtime exact coarse-grained communications for distributed message-passing processes. Applying this technique at runtime has the advantage of being independent of compile-time decisions, such as the tile size chosen for each process. Our approach allows the automatic generation of pre-compiled multi-level parallel routines, libraries, or programs that can adapt their communication, synchronization, and optimization structures to the target system, even when computing nodes have different capabilities. Our experimental results show that, despite our runtime calculation, our approach can automatically produce efficient programs compared with MPI reference codes, and with codes generated with auto-parallelizing compilers.

Highlights

Parallel machines are becoming more heterogeneous, mixing devices with different capabilities in the context of hybrid clusters, with hierarchical shared- and distributed-memory levels
Using current parallel programming models (e.g. Message Passing Interface (MPI), OpenMP, Intel TBBs, Cilk, and PGAS languages such as Chapel, X10, or UPC), the application programmer still faces many important decisions not related with the parallel algorithms, but with implementation issues that are key for obtaining efficient programs
We present a new communication calculation technique to be applied across different SPMD (Single Program Multiple Data) blocks of code, that contain several different data accesses expressions to the same data structure, whose indexes are calculate with uniform affine expressions in the indexes selectors

Summary

Introduction

Parallel machines are becoming more heterogeneous, mixing devices with different capabilities in the context of hybrid clusters, with hierarchical shared- and distributed-memory levels. The work presented in [2] proposes a technique that, from a sequential code, generates a low-level parallel code for distributed-memory systems using the Message Passing Interface (MPI) library This technique improves previous schemes because the code it generates is parametric in the number of processes and problem sizes, reducing the communicated volume of data. – Coarse-grained in the sense that communication calculation across two parallel SPMD blocks is done once for the whole index space mapped to a process at runtime, independently of the number or sizes of tiles generated inside the process This enables different tile sizes to be used in the same computation at the same hierarchical level, an important feature in achieving a good performance on heterogeneous systems that include machines with different architectures [6].

Illustrative example and Overview

Programming with an SPMD model

Overview of the communication determination technique

The Trasgo Model

Notations and definitions

Hitmap library

Implementation of the technique to determine communication patterns

Determining Communications patterns

Communication patterns for specific applications

Experimental study

Improvement achieved by tuning the tile size for each process

Comparison with MPI references