COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

Prasoon Mishra,V Krishna Nandivada

doi:10.1145/3633331

Abstract

Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the threads, using a programmer-specified scheduling policy. While the existing scheduling policies perform reasonably well in the context of balanced workloads, in computations that involve highly imbalanced workloads it is extremely non-trivial to obtain an efficient distribution of work (even using non-static scheduling methods like dynamic and guided). In this paper, we present a scheme called COst aware Work Stealing (COWS) to efficiently extend the idea of work-stealing to OpenMP. In contrast to the traditional work-stealing schedulers, COWS takes into consideration that (i) not all iterations of a parallel-for-loops may take the same amount of time. (ii) identifying a suitable victim for stealing is important for load-balancing, and (iii) queues lead to significant overheads in traditional work-stealing and should be avoided. We present two variations of COWS: WSRI (a naive work-stealing scheme based on the number of remaining iterations) and WSRW (work-stealing scheme based on the amount of remaining workload). Since in irregular loops like those found in graph analytics it is not possible to statically compute the cost of the iterations of the parallel-for-loops, we use a combined compile-time + runtime approach, where the remaining workload of a loop is computed efficiently at runtime by utilizing the code generated by our compile-time component. We have performed an evaluation over seven different benchmark programs, using five different input datasets, on two different hardware across a varying number of threads; leading to a total number of 275 configurations. We show that in 225 out of 275 configurations, compared to the best OpenMP scheduling scheme for that configuration, our approach achieves clear performance gains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization

Lead the way for us

Similar Papers

Brief announcement
Jim Sukha
-
Jim SukhaJim Sukha
11 Aug 2009
11 Aug 2009

Parallel depth first vs. work stealing schedulers on CMP architectures
Vasileios Liaskovitis ... Todd C Mowry
-
Vasileios Liaskovitis, et. al.Vasileios Liaskovitis ... Todd C Mowry
30 Jul 2006
30 Jul 2006

Beyond nested parallelism
Daniel Spoonhower ... Robert Harper
-
Daniel Spoonhower, et. al.Daniel Spoonhower ... Robert Harper
11 Aug 2009
11 Aug 2009

Experimental Study of Thread Scheduling Libraries on Degraded CPU
Christophe Cerin ... Hazem Fkaier
-
Christophe Cerin, et. al.Christophe Cerin ... Hazem Fkaier
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization