Abstract

Work-stealing has been widely used in task-based parallel programming for dynamic load balancing. The overhead of work-stealing on distributed memory systems is much higher than that on shared memory systems. To minimize the overhead of work-stealing on a multi-core cluster, we propose a hierarchical work-stealing framework, in which work-stealing is performed inside a node before across the node boundary. Two key techniques used in our framework to reduce the inter-node steals are: a) adaptive initial partitioning for different task parallel patterns; b) centralized control for inter-node work-stealing, which improves the efficiency of victim selection and termination detection. We compare our technique to the classical work-stealing scheme and a state-of-the-art work-stealing scheme [1] for multi-core clusters. Our technique outperforms them by 19% and 8% respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call