POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems

Xi Huang,Yang Yang,Ziyu Shao

doi:10.1109/tcc.2020.3032577

Abstract

Most online service providers deploy their own data stream processing systems in the cloud to conduct large-scale and real-time data analytics. However, such systems, e.g., Apache Heron, often adopt naive scheduling schemes to distribute data streams (in the units of tuples) among processing instances, which may result in workload imbalance and system disruption. Hence, there still exists a mismatch between the temporal variations of data streams and such inflexible scheduling scheme designs. Besides, the fundamental limits of benefits of predictive scheduling to data stream processing systems remain unexplored. In this article, we focus on the problem of tuple scheduling with predictive service in Apache Heron. With a careful choice in the granularity of system modeling and decision making, we formulate the problem as a stochastic network optimization problem and propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">POTUS</i> , an online predictive scheduling scheme that aims to minimize the response time of data stream processing by steering data streams in a distributed fashion. Theoretical analysis and simulation results show that POTUS achieves an ultra-low response time with a stability guarantee. Moreover, POTUS only requires mild-value of future information to effectively reduce the response time, even with mis-prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing

Lead the way for us

Journal: IEEE Transactions on Cloud Computing	Publication Date: Oct 29, 2020
Citations: 4

Similar Papers

Dynamic Tuple Scheduling with Prediction for Data Stream Processing Systems
Xi Huang ... Ziyu Shao
-
Xi Huang, et. al.Xi Huang ... Ziyu Shao
01 Dec 2019
01 Dec 2019

Self-Adaptive Data Stream Processing in Geo-Distributed Computing Environments
Gabriele Russo Russo
-
Gabriele Russo RussoGabriele Russo Russo
24 Jun 2019
24 Jun 2019

Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors
Shuhao Zhang ... Thomas Heinze
-
Shuhao Zhang, et. al.Shuhao Zhang ... Thomas Heinze
01 Apr 2017
01 Apr 2017

Pipelined fission for stream programs with dynamic selectivity and partitioned state
B Gedik ... Ö Öztürk
Journal of Parallel and Distributed Computing | VOL. 96
B Gedik, et. al.B Gedik ... Ö Öztürk
14 May 2016
Journal of Parallel and Distributed Computing | VOL. 96

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing