Abstract

Barrier synchronization is commonly used for synchronizing processors prior to a join operation and to enforce data dependencies during the execution of parallelized loops. Simple software implementations of barrier synchronization can result in memory hot-spots, especially in large scale shared-memory multiprocessors containing hundreds of processors and memory modules communicating through an interconnection network. A software combining tree can be used to substantially reduce memory contention due to hot-spots. However, such an implementation results inO(logn) latency in recognition of barrier synchronization, wheren is the number of processors. In this paper anadaptive software combining tree is used to implement a scalable barrier withO(1) recognition latency. The processors that arrive early at the barrier adapt the combining tree so that it has a structure appropriate for reducing the latency for the processors that arrive later. We also show how adaptive combining trees can be used to implement the fuzzy barrier. The fuzzy barrier mechanism reduces the idling of processors at the barriers by allowing the processors to execute useful instructions while they are waiting at the barrier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call