Proactive work stealing for futures

Kyle Singer,I-Ting Angelina Lee,Yifan Xu

doi:10.1145/3293883.3295735

Abstract

The use of futures provides a flexible way to express parallelism and can generate arbitrary dependences among parallel subcomputations. The additional flexibility that futures provide comes with a cost, however. When scheduled using classic work stealing, a program with futures, compared to a program that uses only fork-join parallelism, can incur a much higher number of a metric for evaluating the performance of parallel executions. All prior works assume a parsimonious work-stealing scheduler, however, where a worker thread (surrogate of a processor) steals work only when its local deque becomes empty. In this work, we investigate an alternative scheduling approach, called ProWS, where the workers perform proactive work stealing when handling future operations. We show that ProWS, for programs that use futures, can provide provably efficient execution time and equal or better bounds on the number of deviations compared to classic parsimonious work stealing. Given a computation with T1 work and T∞ span, ProWS executes the computation on P processors in expected time O(T1/P + T∞ lg P), with an additional lg P overhead on the span term compared to the parsimonious variant. For structured use of futures, where each future is single touch with no race on the future handle, the algorithm incurs deviations, matching that of the parsimonious variant. For general use of futures, the algorithm incurs O(mkT∞ + PT∞ lg P) deviations, where mk is the maximum number of future touches that are logically parallel. Compared to the bound for the parsimonious variant, O(kT∞ + PT∞), with k being the total number of touches in the entire computation, this bound is better assuming mk = Ω(P lg P) and is smaller than k, which holds true for all the benchmarks we examined.

Full Text