Abstract
Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed- and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design. This paper presents an approach to ease the programming for mixed distributed and shared memory parallel computers. The coordination at the distributed memory level is simplified using Hitmap, a library for distributed computing using hierarchical tiling of data structures. We show how this tool can be integrated with shared-memory programming models and automatic code generation tools to efficiently exploit the multicore environment of each multicomputer node. This approach allows to exploit the most appropriate techniques for each model, easily generating multilevel parallel programs that automatically adapt their communication and synchronization structures to the target machine. Our experimental results show how this approach mimics or even improves the best performance results obtained with manually optimized codes using pure MPI or OpenMP models.
Highlights
The polyhedral model has been proved to be a useful tool to transform and generate parallel programs for codes with affine nested loops [1]
In this paper we study the codes generated by the most sophisticated communication scheme introduced so far
This paper presents a model for the run-time cost of the codes generated by a state-of-the-art polyhedral-model technique (FOP scheme), for communication management in a distributed-memory environment
Summary
The polyhedral model has been proved to be a useful tool to transform and generate parallel programs for codes with affine nested loops [1]. The automatically generated codes are capable of coordinating the computation and communication across heterogeneous devices This allows the exploitation of parallelism in heterogeneous clusters with GPUs or other accelerators, which is the current trend to build huge parallel systems [8]. The scale of the machines and problems that can be currently faced, grows by several orders of magnitude comparing with those found in most performance evaluations done previously with distributed-memory polyhedral generated codes. It will continue growing up, with exascale computing being an important research focus.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have