Pipeline-Based Linear Scheduling of Big Data Streams in the Cloud

Nicoleta Tantalaki,Stefanos Katsavounis,Manos Roumeliotis,Stavros Souravlas

doi:10.1109/access.2020.3004612

Nicoleta Tantalaki, Stefanos Katsavounis + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.3004612

Copy DOI

Abstract

Nowadays, there is an accelerating need to efficiently and timely handle large amounts of data that arrives continuously. Streams of big data led to the emergence of several Distributed Stream Processing Systems (DSPS) that assign processing tasks to the available resources (dynamically or not) and route streaming data between them. Efficient scheduling of processing tasks can reduce application latencies and eliminate network congestions. However, the available DSPSs' in-built scheduling techniques are far from optimal. In this work, we extend our previous work, where we proposed a linear scheme for the task allocation and scheduling problem. Our scheme takes advantage of pipelines to handle efficiently applications, where there is need for heavy communication (all-to-all) between tasks assigned to pairs of components. In this work, we prove that our scheme is periodic, we provide a communication refinement algorithm and a mechanism to handle many-to-one assignments efficiently. For concreteness, our work is illustrated based on Apache Storm semantics. The performance evaluation depicts that our algorithm achieves load balance and constraints the required buffer space. For throughput testing, we compared our work to the default Storm scheduler, as well as to R-Storm. Our scheme was found to outperform both the other strategies and achieved an average of 25%-40% improvement compared to Storm's default scheduler under different scenarios, mainly as a result of reduced buffering (≈ 45% less memory). Compared to R-storm, the results indicate an average of 35%-45% improvement.

Highlights

Over past 20 years data has increased in a large scale and in various fields
EXPERIMENTAL RESULTS we discuss how we evaluate the performance of our proposed scheduling approach
We present two sets of experiments, as will be described in the following paragraph: In the first set, we examine the average latency, the percentage of buffer memory used, the load balancing per node, and the throughput to check our system’s performance against the default Apache Storm’s scheduler using a random and a linear topology

Summary

INTRODUCTION

Over past 20 years data has increased in a large scale and in various fields. The rapid growth of cloud computing and Internet of Things (IoT) promote the sharp growth of data further. To the best of our knowledge most of the existing solutions found in literature rarely consider memory consumption in their analysis and while they take into account the capability of the resources, they generally ignore their load This time, we prove that the task allocation scheme that forms the basis for our scheduler is periodic and the number of necessary computations can be decreased. Taking into consideration that relevant tasks should better be assigned to the same or adjacent nodes, we provide a communication refinement algorithm that was only briefly described in our previous work We add another pipeline-based scheme to handle ‘‘many-to-one’’ assignments between tasks effectively and improve the system’s performance.

BACKGROUND

All the row indices in every class will move to a new row index:

TASK SCHEDULING

MOTIVATING EXAMPLES

EXPERIMENTAL RESULTS

RELATED WORK

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Pipeline-Based Linear Scheduling of Big Data Streams in the Cloud

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Linear Scheduling of Big Data Streams on Multiprocessor Sets in the Cloud
Nicoleta Tantalaki ... Manos Roumeliotis
-
Nicoleta Tantalaki, et. al.Nicoleta Tantalaki ... Manos Roumeliotis
14 Oct 2019
14 Oct 2019

Mathematical modeling for further improving task scheduling on Big Data systems
Stavros Souravlas ... Angelo Sifaleras
Computational Management Science | VOL. 20
Stavros Souravlas, et. al.Stavros Souravlas ... Angelo Sifaleras
06 Sep 2023
Computational Management Science | VOL. 20

Priority-Based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications
Paolo Bellavista ... Antonio Corradi
-
Paolo Bellavista, et. al.Paolo Bellavista ... Antonio Corradi
01 Dec 2014
01 Dec 2014

Efficient Distributed Stream Processing: Optimization Approaches and Applications

-

01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pipeline-Based Linear Scheduling of Big Data Streams in the Cloud

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access