Asynchronous Work Stealing on Distributed Memory Systems

Shigang Li Shigang Li,Jingyuan Hu Jingyuan Hu,Chongchong Zhao Chongchong Zhao,Xin Cheng Xin Cheng

doi:10.1109/pdp.2013.35

Abstract

Work stealing is a popular policy for dynamic load balancing of irregular applications. However, communication overhead incurred by work stealing may make it less efficient, especially on distributed memory systems. In this work we propose an asynchronous work stealing (AsynchWS) strategy which exploits opportunities to overlap communication with local residual tasks. Profiling information is collected locally to optimize task granularity and guide the asynchronous work stealing. AsynchWS is implemented in Unified Parallel C (UPC), which effectively supports non-blocking one-sided communication and facilitates the implementation. Experiments are conducted on a 32 nodes Xeon X5650 cluster using a set of irregular applications. Results show that up to 16% better performance than the state-of-the-art strategies on distributed memory.

Full Text