Boosting Big Data Streaming Applications in Clouds With BurstFlow

Paulo Ricardo Rodrigues De Souza,Julio C S Dos Anjos,Kassiano J Matteussi,Alexandre Da Silva Veith,Claudio F R Geyer,Valderi R Q Leithardt,Breno F Zanchetta,Edison Pignaton De Freitas,Alvaro L Murciego

doi:10.1109/access.2020.3042739

Abstract

The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers’ communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%.

Highlights

INTRODUCTIONThe advent of the Internet of Things (IoT) has led to new challenges in the Big Data era due to the limitations of storage, computation, and communication of existing devices since IoT devices generate massive amounts of data that require processing to support the decision-making
The proposed solution overcomes existing orchestration issues presented in cloud-based stream processing frameworks
BurstFlow enables to control the distribution of data in each operator replica by employing ad-hoc partitioning policies

Summary

INTRODUCTION

The advent of the Internet of Things (IoT) has led to new challenges in the Big Data era due to the limitations of storage, computation, and communication of existing devices since IoT devices generate massive amounts of data that require processing to support the decision-making. The SP applications consume one message at-at-time from the MQS because they assume the data source is co-located in the data center, neglecting the existing network latency Another issue is that cloud-based frameworks count on homogeneous workloads when distributing data in the SP framework and neglect performance metrics such as data ingestion, memory, or network utilization. Unlike the works found in the literature, BurstFlow uses an adaptive and dynamic model to estimate the number of events per message based on feedback loops that monitor the batch size in the memory of workers to further forward data. This approach leads to overcoming contention scenarios while maintaining network stability. BurstFlow explores this gap to propose a flow partition approach

PROBLEM STATEMENT

EXPERIMENTAL SETUP

Findings

CONCLUSIONS AND FUTURE WORK

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 36	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Boosting Big Data Streaming Applications in Clouds With BurstFlow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Cloud computing and big data: Technologies and applications
Mostapha Zbakh ... Pierre Manneback
Concurrency and Computation: Practice and Experience | VOL. 30
Mostapha Zbakh, et. al.Mostapha Zbakh ... Pierre Manneback
20 May 2018
Concurrency and Computation: Practice and Experience | VOL. 30

Predictive modeling in reproductive medicine: Where will the future of artificial intelligence research take us?
Carol Lynn Curchoe ... Zev Rosenwaks
Fertility and Sterility | VOL. 114
Carol Lynn Curchoe, et. al.Carol Lynn Curchoe ... Zev Rosenwaks
01 Nov 2020
Fertility and Sterility | VOL. 114

VAStream
Satya Katragadda ... Christoph W Borst
-
Satya Katragadda, et. al.Satya Katragadda ... Christoph W Borst
28 Jul 2019
28 Jul 2019

Non-intrusive Monitoring of Stream Processing Applications
Michael Vogler ... Bernhard Nickel
-
Michael Vogler, et. al.Michael Vogler ... Bernhard Nickel
01 Mar 2016
01 Mar 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Boosting Big Data Streaming Applications in Clouds With BurstFlow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access