Performance Analysis of Work Stealing in Large-scale Multithreaded Computing

Nikki Sonenberg,Grzegorz Kielanski,Benny Van Houdt

doi:10.1145/3470887

Abstract

Randomized work stealing is used in distributed systems to increase performance and improve resource utilization. In this article, we consider randomized work stealing in a large system of homogeneous processors where parent jobs spawn child jobs that can feasibly be executed in parallel with the parent job. We analyse the performance of two work stealing strategies: one where only child jobs can be transferred across servers and the other where parent jobs are transferred. We define a mean-field model to derive the response time distribution in a large-scale system with Poisson arrivals and exponential parent and child job durations. We prove that the model has a unique fixed point that corresponds to the steady state of a structured Markov chain, allowing us to use matrix analytic methods to compute the unique fixed point. The accuracy of the mean-field model is validated using simulation. Using numerical examples, we illustrate the effect of different probe rates, load, and different child job size distributions on performance with respect to the two stealing strategies, individually, and compared to each other.

Full Text