Many important parallel applications are data parallel, and may be efficiently implemented on a workstation cluster by allocating each workstation a contiguous partition of the data domain. Implementation on non-dedicated clusters, however, is complicated by the possibility of changes in workstation availability. For example, a personal workstation may be reclaimed by its primary user for interactive use. In such situations, a node must be removed from the collection of workstations forming the “virtual parallel machine” allocated to the application, and data redistributed accordingly. Conversely, workstations may become available to join the virtual parallel machine. This paper identifies fundamental characteristics of efficient policies for data redistribution following addition/removal of workstations from the cluster. The following conclusions are obtained based on mathematical analysis and simulations: (a) allocating data to a new node from the center of the data domain substantially reduces data migration costs compared to allocation from the edge; (b) addition in groups is beneficial compared to repeated single additions; and (c) even a large number of incremental adjustments of the data domain partitions, owing to successive additions/removals of nodes, do not appear to substantially degrade partition quality compared to that obtained by partitioning from scratch. We believe that these observations can be fruitfully incorporated in the design of workstation cluster support systems for data parallel computing.
Read full abstract