Stability and Optimization of Speculative Queueing Networks

Jonatha Anselmi,Neil Walton

doi:10.1109/tnet.2021.3128778

Abstract

We provide a queueing-theoretic framework for job replication schemes based on the principle “ <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">replicate a job as soon as the system detects it as a straggler</i> ”. This is called job <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">speculation</i> . Recent works have analyzed replication on arrival, which we refer to as <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">replication</i> . Replication is motivated by its implementation in Google’s BigTable. However, systems such as Apache Spark and Hadoop MapReduce implement speculative job execution. The performance and optimization of speculative job execution is not well understood. To this end, we propose a queueing network model for load balancing where each server can speculate on the execution time of a job. Specifically, each job is initially assigned to a single server by a frontend dispatcher. Then, when its execution begins, the server sets a timeout. If the job completes before the timeout, it leaves the network, otherwise the job is terminated and relaunched or resumed at another server where it will complete. We provide a necessary and sufficient condition for the stability of speculative queueing networks with heterogeneous servers, general job sizes and scheduling disciplines. We find that speculation can increase the stability region of the network when compared with standard load balancing models and replication schemes. We provide general conditions under which timeouts increase the size of the stability region and derive a formula for the optimal speculation time, i.e., the timeout that minimizes the load induced through speculation. We compare speculation with redundant- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> and redundant-to-idle-queue- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> rules under an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$S\& X$ </tex-math></inline-formula> model. For light loaded systems, redundancy schemes provide better response times. However, for moderate to heavy loadings, redundancy schemes can lose capacity and have markedly worse response times when compared with the proposed speculative scheme.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on networking : a joint publication of the IEEE Communications Society, the IEEE Computer Society, and the ACM with its Special Interest Group on Data Communication	Publication Date: Apr 1, 2022
Citations: 1	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Stability and Optimization of Speculative Queueing Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on networking : a joint publication of the IEEE Communications Society, the IEEE Computer Society, and the ACM with its Special Interest Group on Data Communication

Lead the way for us

Similar Papers

PaRS: A Popularity-Aware Redundancy Scheme for In-Memory Stores
Panping Zhou ... Xiao Qin
IEEE Transactions on Computers | VOL. 68
Panping Zhou, et. al.Panping Zhou ... Xiao Qin
01 Apr 2019
IEEE Transactions on Computers | VOL. 68

R2D: Combining replication and redundancy to enhance the performance and reliability of storage systems
Min-Chun Chen ... Yun-Shan Hsieh
-
Min-Chun Chen, et. al.Min-Chun Chen ... Yun-Shan Hsieh
01 May 2017
01 May 2017

Mitigate data skew caused stragglers through ImKP partition in MapReduce
Xue Ouyang ... Stephen Clement
-
Xue Ouyang, et. al.Xue Ouyang ... Stephen Clement
01 Dec 2017
01 Dec 2017

Scalable and Distributed Mechanisms for Integrated Scheduling and Replication in Data Grids
Anirban Chakrabarti ... Shubhashis Sengupta
-
Anirban Chakrabarti, et. al.Anirban Chakrabarti ... Shubhashis Sengupta
05 Jan 2008
05 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Stability and Optimization of Speculative Queueing Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on networking : a joint publication of the IEEE Communications Society, the IEEE Computer Society, and the ACM with its Special Interest Group on Data Communication