Optimal expression evaluation for data parallel architectures

John R Gilbert

doi:10.1016/0743-7315(91)90109-m

Abstract

A data parallel machine represents an array or another composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. We give an efficient algorithm to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. Our algorithm applies to any architecture in which the metric describing the cost of moving an array has a property we call “robustness.” This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. We remark on several variations of the problem, some of which we solve and some of which remain open.

Full Text