SLA-Based Adaptation Schemes in Distributed Stream Processing Engines

Muhammad Hanif,Sumi Helal,Choonhwa Lee,Eunsam Kim

doi:10.3390/app9061045

Muhammad Hanif, Sumi Helal + Show 2 more

Open Access

https://doi.org/10.3390/app9061045

Copy DOI

Abstract

With the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing frameworks, such as Flink, Storm, Spark, and Samza. These frameworks endorse complex queries on streaming data to be distributed across multiple worker nodes in a cluster. Few of these stream processing frameworks provides fundamental support for controlling the latency and throughput of the system as well as the correctness of the results. However, none has the ability to handle them on the fly at runtime. We present a well-informed and efficient adaptive watermarking and dynamic buffering timeout mechanism for the distributed streaming frameworks. It is designed to increase the overall throughput of the system by making the watermarks adaptive towards the stream of incoming workload, and scale the buffering timeout dynamically for each task tracker on the fly while maintaining the Service Level Agreement (SLA)-based end-to-end latency of the system. This work focuses on tuning the parameters of the system (such as window correctness, buffering timeout, and so on) based on the prediction of incoming workloads and assesses whether a given workload will breach an SLA using output metrics including latency, throughput, and correctness of both intermediate and final results. We used Apache Flink as our testbed distributed processing engine for this work. However, the proposed mechanism can be applied to other streaming frameworks as well. Our results on the testbed model indicate that the proposed system outperforms the status quo of stream processing. With the inclusion of learning models like naïve Bayes, multilayer perceptron (MLP), and sequential minimal optimization (SMO)., the system shows more progress in terms of keeping the SLA intact as well as quality of service (QoS).

Highlights

Contemporary data-intensive applications necessitate the persistent increment in the power of computing resources and the volume of storage devices which are in many real-life use cases essential on-demand for a specific operation in data lifecycle such as data collection, extraction, processing, and reporting
We focus on the tradeoff as an increase of throughput can cause latency to be a breach of Service Level Agreement (SLA), or it can hurt window correctness
The results shows that all the workload analysis based systems outperform the default system, this effect is due to the fact that with the overloading, the default system stabilizes its latency at higher value leading to the effect of the incoming stream of data to wait in the queues longer and causes SLA breach or reduction in quality of service (QoS)

Summary

Introduction

Contemporary data-intensive applications necessitate the persistent increment in the power of computing resources and the volume of storage devices which are in many real-life use cases essential on-demand for a specific operation in data lifecycle such as data collection, extraction, processing, and reporting It needs to be elastically scaled up and down according to the incoming workload. Flow [8], and Flink [9] have been developed for this very purpose; to support the dynamic analytics of the streaming datasets These distributed processing systems handle both the batch and real-time analytics which represent the core of modern big data applications. These frameworks orchestrate numerous nodes structured in a cluster and distribute the workload through communication using different messing techniques.

Problem Statement

Max-Out-Of-Orderness

Buffer Timeout

Subtask

Correctness

Proposed

Late Elements Frequency

Throughput

Latency

Latency Control Mechanism

10. Dynamic

DYNAMIC bufferingTimeout

Target Latency Modes

Workloadrather

Workload Analysis

System Experimentation

Performance Evaluation Experimentation

Related Research

Concluding Remarks and Future Directions

Methods

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Mar 13, 2019
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

SLA-Based Adaptation Schemes in Distributed Stream Processing Engines

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

An adaptive SLA-based data flow mechanism for stream processing engines
Muhammad Hanif ... Choonhwa Lee
-
Muhammad Hanif, et. al.Muhammad Hanif ... Choonhwa Lee
01 Oct 2017
01 Oct 2017

Targeting a light-weight and multi-channel approach for distributed stream processing
Vinu Ellampallil Venugopal ... Amal Tawakuli
Journal of Parallel and Distributed Computing | VOL. 167
Vinu Ellampallil Venugopal, et. al.Vinu Ellampallil Venugopal ... Amal Tawakuli
02 May 2022
Journal of Parallel and Distributed Computing | VOL. 167

A Backpressure Mitigation Scheme in Distributed Stream Processing Engines
Muhammad Hanif ... Hyeongdeok Yoon
-
Muhammad Hanif, et. al.Muhammad Hanif ... Hyeongdeok Yoon
01 Jan 2020
01 Jan 2020

Modeling Distributed Stream Processing Systems Under Heavy Workload
Muhammad Mudassar Qureshi ... Hai Jin
-
Muhammad Mudassar Qureshi, et. al.Muhammad Mudassar Qureshi ... Hai Jin
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SLA-Based Adaptation Schemes in Distributed Stream Processing Engines

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences