Predictive topology refinements in distributed stream processing system.

Muhammad Hanif,Choonhwa Lee,Sumi Helal

doi:10.1371/journal.pone.0240424

Muhammad Hanif, Choonhwa Lee + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0240424

Copy DOI

Journal: PloS one	Publication Date: Nov 5, 2020
Citations: 1	License type: CC BY 4.0

Affiliation: Hanyang University, Lancaster University

Abstract

Cloud computing has evolved the big data technologies to a consolidated paradigm with SPaaS (Streaming processing-as-a-service). With a number of enterprises offering cloud-based solutions to end-users and other small enterprises, there has been a boom in the volume of data, creating interest of both industry and academia in big data analytics, streaming applications, and social networking applications. With the companies shifting to cloud-based solutions as a service paradigm, the competition grows in the market. Good quality of service (QoS) is a must for the enterprises, as they strive to survive in a competitive environment. However, achieving reasonable QoS goals to meet SLA agreement cost-effectively is challenging due to variation in workload over time. This problem can be solved if the system has the ability to predict the workload for the near future. In this paper, we present a novel topology-refining scheme based on a workload prediction mechanism. Predictions are made through a model based on a combination of SVR, autoregressive, and moving average model with a feedback mechanism. Our streaming system is designed to increase the overall performance by making the topology refining robust to the incoming workload on the fly, while still being able to achieve QoS goals of SLA constraints. Apache Flink distributed processing engine is used as a testbed in the paper. The result shows that the prediction scheme works well for both workloads, i.e., synthetic as well as real traces of data.

Highlights

With the evolution of cloud computing from a set of promising virtualization and data center technologies to a centralized paradigm for the delivery of the computing as a service to customers in a pay-as-you-go manner, adaptation of the technology by enterprises is growing fast by days, and so is the number of cloudbased companies offering cloud services to end customers
Data parallelism essentially splits a larger dataset into more manageable subsets, through either physical or logical partitioning, which allows the tasks to be executed in parallel across the subsets. b) Incremental Processing: most of the distributed stream processing systems have the competence to process data incrementally, as opposed to batch processing where each operator processes all the data, forwarding the gathered data onto the operator, in a repeated loop, resulting in a significant delay of the final result
In order to demonstrate the generality of the scheme with varying number of parallel threads, we plotted the default parallelism of the Apache Flink, Autoregressive Integrated Moving Average (ARIMA) based TRS, and the decisions taken by ARIMA+support vector regression (SVR) TRS optimization

Summary

Introduction

With the evolution of cloud computing from a set of promising virtualization and data center technologies to a centralized paradigm for the delivery of the computing as a service to customers (like other utilities such as water, gas, and electricity) in a pay-as-you-go manner, adaptation of the technology by enterprises is growing fast by days, and so is the number of cloudbased companies offering cloud services to end customers. We present our TRS(Topology Refining Scheme) system capable of refining and re-adjusting the topology of streaming processing systems on the fly at run-time based on autoregressive and moving average workload prediction models. To handle such a vast amount of seemingly limitless data in an efficient and expansive manner, a host of streaming processing systems emerged, including Dataflow model [14], Samza [15], Storm, and Flink These frameworks deal with any and all arriving, real-time streams that is distributed to each of the nodes in the cluster. Seasonal spikes typically occur over the holidays, like Christmas, while unexpected spikes can happen at any point in time across the year To handle this immense workload, a system is required to have the capability of scaling upwards or downwards in terms of the operator’s parallelism in the pipeline, depending on any arriving data streams. Ð8Þ where Ch is the auto-covariance function defined as in Eq 9:

ÞðXtÀ h À

Related work

Findings

Concluding remarks and future directions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predictive topology refinements in distributed stream processing system.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

ROAR: A QoS-oriented modeling framework for automated cloud resource allocation and optimization
Yu Sun ... Douglas C Schmidt
The Journal of Systems & Software | VOL. 116
Yu Sun, et. al.Yu Sun ... Douglas C Schmidt
24 Aug 2015
The Journal of Systems & Software | VOL. 116

Aggregated Quantified Response Time Matrix Formulation (ARMF) - A New Quality of Service Paradigm Technique
Balika J Chelliah ... K Vivekanandan
Journal of Software | VOL. 10
Balika J Chelliah, et. al.Balika J Chelliah ... K Vivekanandan
01 Jan 2015
Journal of Software | VOL. 10

Adaptive control of virtualized resources in utility computing environments
Pradeep Padala ... Kang G Shin
-
Pradeep Padala, et. al.Pradeep Padala ... Kang G Shin
21 Mar 2007
21 Mar 2007

Adaptive control of virtualized resources in utility computing environments
Pradeep Padala ... Xiaoyun Zhu
ACM SIGOPS Operating Systems Review | VOL. 41
Pradeep Padala, et. al.Pradeep Padala ... Xiaoyun Zhu
21 Mar 2007
ACM SIGOPS Operating Systems Review | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predictive topology refinements in distributed stream processing system.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one