Abstract

Cloud computing has evolved the big data technologies to a consolidated paradigm with SPaaS (Streaming processing-as-a-service). With a number of enterprises offering cloud-based solutions to end-users and other small enterprises, there has been a boom in the volume of data, creating interest of both industry and academia in big data analytics, streaming applications, and social networking applications. With the companies shifting to cloud-based solutions as a service paradigm, the competition grows in the market. Good quality of service (QoS) is a must for the enterprises, as they strive to survive in a competitive environment. However, achieving reasonable QoS goals to meet SLA agreement cost-effectively is challenging due to variation in workload over time. This problem can be solved if the system has the ability to predict the workload for the near future. In this paper, we present a novel topology-refining scheme based on a workload prediction mechanism. Predictions are made through a model based on a combination of SVR, autoregressive, and moving average model with a feedback mechanism. Our streaming system is designed to increase the overall performance by making the topology refining robust to the incoming workload on the fly, while still being able to achieve QoS goals of SLA constraints. Apache Flink distributed processing engine is used as a testbed in the paper. The result shows that the prediction scheme works well for both workloads, i.e., synthetic as well as real traces of data.

Highlights

  • With the evolution of cloud computing from a set of promising virtualization and data center technologies to a centralized paradigm for the delivery of the computing as a service to customers in a pay-as-you-go manner, adaptation of the technology by enterprises is growing fast by days, and so is the number of cloudbased companies offering cloud services to end customers

  • Data parallelism essentially splits a larger dataset into more manageable subsets, through either physical or logical partitioning, which allows the tasks to be executed in parallel across the subsets. b) Incremental Processing: most of the distributed stream processing systems have the competence to process data incrementally, as opposed to batch processing where each operator processes all the data, forwarding the gathered data onto the operator, in a repeated loop, resulting in a significant delay of the final result

  • In order to demonstrate the generality of the scheme with varying number of parallel threads, we plotted the default parallelism of the Apache Flink, Autoregressive Integrated Moving Average (ARIMA) based TRS, and the decisions taken by ARIMA+support vector regression (SVR) TRS optimization

Read more

Summary

Introduction

With the evolution of cloud computing from a set of promising virtualization and data center technologies to a centralized paradigm for the delivery of the computing as a service to customers (like other utilities such as water, gas, and electricity) in a pay-as-you-go manner, adaptation of the technology by enterprises is growing fast by days, and so is the number of cloudbased companies offering cloud services to end customers. We present our TRS(Topology Refining Scheme) system capable of refining and re-adjusting the topology of streaming processing systems on the fly at run-time based on autoregressive and moving average workload prediction models. To handle such a vast amount of seemingly limitless data in an efficient and expansive manner, a host of streaming processing systems emerged, including Dataflow model [14], Samza [15], Storm, and Flink These frameworks deal with any and all arriving, real-time streams that is distributed to each of the nodes in the cluster. Seasonal spikes typically occur over the holidays, like Christmas, while unexpected spikes can happen at any point in time across the year To handle this immense workload, a system is required to have the capability of scaling upwards or downwards in terms of the operator’s parallelism in the pipeline, depending on any arriving data streams. Ð8Þ where Ch is the auto-covariance function defined as in Eq 9:

ÞðXtÀ h À
Related work
Findings
Concluding remarks and future directions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call