Efficient Techniques for Nested and Disjoint Barrier Synchronization

Vara Ramakrishnan,Isaac D Scherson,Raghu Subramanian

doi:10.1006/jpdc.1999.1556

Abstract

Current MIMD computers support the execution of data parallel programs by providing a tree network to perform fast barrier synchronizations. However, there are two major limitations to using tree networks: The first arises due to control nesting in programs, and the second arises when the MIMD computer needs to run several programs simultaneously. First, we present two hardware barrier synchronization schemes which can support deep levels of control nesting in data parallel programs. Hardware barriers are usually an order of magnitude faster than software implementations. Since large data parallel programs often have several levels of nested barriers, these schemes provide significant speedups in the execution of such programs on MIMD computers. The first scheme performs code transformations and uses two single-bit-trees to implement unlimited levels of nested barriers. However, this scheme increases the code size. The second scheme uses a more expensive integer-tree to support an exponential number of nesting levels without increasing the code size: we-show that up to n nested barriers can be supported by a network with bisection bandwidth O(logn) and a latency of O(logplogn) gate delays. Using tree network hardware already available on commercial MIMD computers, this scheme can support more than four billion levels of nesting. Second, we present a design for a barrier synchronization network that is free from the partitioning constraints imposed by barrier trees. When the MIMD computer is partitioned among several jobs, then, rather than barrier synchronizations, we desire multiple disjoint barrier synchronizations (MDBSs), where processors within each partition barrier synchronize among themselves without interfering with other partitions. Barrier trees can be adapted to handle MDBSs, but only if the partitions are constrained to be of very special sizes and shapes. These stringent constraints on partitioning often run contrary to other important considerations, such as the contiguity of the processors of each partition within the data network. Our MDBS network design allows for any number of partitions of any size and shape, as long as the processors comprising each partition are contiguous in the data network.

Full Text