Scheduling parallel programs by work stealing with private deques

Umut A Acar,Mike Rainey,Arthur Chargueraud

doi:10.1145/2517327.2442538

Umut A Acar, Mike Rainey + Show 1 more

Open Access

https://doi.org/10.1145/2517327.2442538

Copy DOI

Abstract

Work stealing has proven to be an effective method for scheduling parallel programs on multicore computers. To achieve high performance, work stealing distributes tasks between concurrent queues, called deques, which are assigned to each processor. Each processor operates on its deque locally except when performing load balancing via steals. Unfortunately, concurrent deques suffer from two limitations: 1) local deque operations require expensive memory fences in modern weak-memory architectures, 2) they can be very difficult to extend to support various optimizations and flexible forms of task distribution strategies needed many applications, e.g., those that do not fit nicely into the divide-and-conquer, nested data parallel paradigm. For these reasons, there has been a lot recent interest in implementations of work stealing with non-concurrent deques, where deques remain entirely private to each processor and load balancing is performed via message passing. Private deques eliminate the need for memory fences from local operations and enable the design and implementation of efficient techniques for reducing task-creation overheads and improving task distribution. These advantages, however, come at the cost of communication. It is not known whether work stealing with private deques enjoys the theoretical guarantees of concurrent deques and whether they can be effective in practice. In this paper, we propose two work-stealing algorithms with private deques and prove that the algorithms guarantee similar theoretical bounds as work stealing with concurrent deques. For the analysis, we use a probabilistic model and consider a new parameter, the branching depth of the computation. We present an implementation of the algorithm as a C++ library and show that it compares well to Cilk on a range of benchmarks. Since our approach relies on private deques, it enables implementing flexible task creation and distribution strategies. As a specific example, we show how to implement task coalescing and steal-half strategies, which can be important in fine-grain, non-divide-and-conquer algorithms such as graph algorithms, and apply them to the depth-first-search problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM SIGPLAN Notices	Publication Date: Feb 23, 2013
Citations: 22	License type: other-oa

R Discovery Prime

R Discovery Prime

Scheduling parallel programs by work stealing with private deques

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices

Lead the way for us

Similar Papers

Scheduling parallel programs by work stealing with private deques
Umut A Acar ... Arthur Chargueraud
-
Umut A Acar, et. al.Umut A Acar ... Arthur Chargueraud
23 Feb 2013
23 Feb 2013

Efficient Work-Stealing with Blocking Deques
Chi Liu ... Ping Song
-
Chi Liu, et. al.Chi Liu ... Ping Song
01 Aug 2014
01 Aug 2014

Hierarchical task distribution for decentralized subsystem with centralized output
Lu Xu ... O Ozgiiner
-
Lu Xu, et. al. Lu Xu ... O Ozgiiner
01 Jan 2004
01 Jan 2004

A dynamic task distribution and engine allocation strategy for distributed execution of logic programs
George Xirogiannis ... Hamish Taylor
-
George Xirogiannis, et. al.George Xirogiannis ... Hamish Taylor
01 Jan 1998
01 Jan 1998

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scheduling parallel programs by work stealing with private deques

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices