Abstract

Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed- and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design. This paper presents an approach to ease the programming for mixed distributed and shared memory parallel computers. The coordination at the distributed memory level is simplified using Hitmap, a library for distributed computing using hierarchical tiling of data structures. We show how this tool can be integrated with shared-memory programming models and automatic code generation tools to efficiently exploit the multicore environment of each multicomputer node. This approach allows to exploit the most appropriate techniques for each model, easily generating multilevel parallel programs that automatically adapt their communication and synchronization structures to the target machine. Our experimental results show how this approach mimics or even improves the best performance results obtained with manually optimized codes using pure MPI or OpenMP models.

Highlights

  • The polyhedral model has been proved to be a useful tool to transform and generate parallel programs for codes with affine nested loops [1]

  • In this paper we study the codes generated by the most sophisticated communication scheme introduced so far

  • This paper presents a model for the run-time cost of the codes generated by a state-of-the-art polyhedral-model technique (FOP scheme), for communication management in a distributed-memory environment

Read more

Summary

INTRODUCTION

The polyhedral model has been proved to be a useful tool to transform and generate parallel programs for codes with affine nested loops [1]. The automatically generated codes are capable of coordinating the computation and communication across heterogeneous devices This allows the exploitation of parallelism in heterogeneous clusters with GPUs or other accelerators, which is the current trend to build huge parallel systems [8]. The scale of the machines and problems that can be currently faced, grows by several orders of magnitude comparing with those found in most performance evaluations done previously with distributed-memory polyhedral generated codes. It will continue growing up, with exascale computing being an important research focus.

THE COMMUNICATION SCHEME
COST MODEL
General cost for a distributed loop
Problem size and number of iterations
Distribution policy
Packing stage
Coordination and communication stage
Unpacking stage
Total cost
CASE STUDY
Cost model parametrization
Simulation study
Experimental environment
Results
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call