Abstract

The availability of large distributed memory multiprocessors and parallel computers with complex memory hierarchies have left the programmer with the difficult task of planning the detailed parallel execution of a program; for example, in the case of distributed memory machines, the programmer is forced to manually distribute code and data in addition to managing communication among tasks explicitly. Current work in compiler support for this task has focused on automating task partitioning, assuming a fixed data partition or a programmer-specified (in the form of annotations) data partition. This paper argues that one of the reasons for inefficient parallel execution is the lack of synergism between task partitioning and data partitioning and allocation; hence data and task allocation should both be influenced by the inherent dependence structure of the computation (which is the source of synergism). We present a methodology based on a unified approach to task and data partitioning; we show how to derive data and task partitions for computations expressed as nested loops that exhibit regular dependencies and how to map these onto distributed memory multiprocessors. Based on the mapping, we show how to derive code for the nodes of a distributed memory machine with appropriate message transmission constructs. We also discuss related communication optimizations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call