Abstract

In recent years, real-time and big data aroused and received a lot of attention due to the spread of embedded systems in almost everything in life. This has led to many challenges that need to be solved to enhance and improve systems that work on big real-time data. Apache Storm is a system used for computing and analyzing big real-time data of distributed systems. This paper aims to develop a scheduler to improve the scheduling of the applications represented by topologies on the Storm cluster. The proposed scheduler is hybridization between the scheduling algorithms of A3 Storm and the Workload scheduler. Its objective is to minimize the communication between tasks while balancing the workload on all cluster machines. The proposed scheduler is compared with the A3 Storm and Fischer and Bernstein’s scheduling algorithm. The comparison has been made using four different topologies. The experimental results show that our proposed scheduler outperforms the two other schedulers in throughput and complete latency.

Highlights

  • Real-time applications such as IoT sensors, climate, and healthcare produce a large amount of continuous real-time data

  • The paper contribution is as follows: 1) We proposed a hybrid between two algorithms, the Workload scheduling algorithm and the A3 Storm algorithm, which improves the performance of the Apache Storm

  • The experimental study is done on the Apache Storm cluster, which has a Nimbus node, Zookeeper node, and two supervisor nodes having three and four slots, respectively

Read more

Summary

Introduction

Real-time applications such as IoT sensors, climate, and healthcare produce a large amount of continuous real-time data. The nature of this type of data is overgrowing where it can reach quintillions of bytes every day. This extreme and rapid growth of data leads to the term “big data” [1]. Stream processing refers to processing a large amount of data in real time. Big Data needs specified applications for processing the data, such as Hadoop for batch processing and Apache Storm, S4, Spark, and Flink for realtime streaming applications [4]

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.