An Experimental Based Study to Evaluate the Efficiency among Stream Processing Tools

Akshay Mudgal,Shaveta Bhatia

doi:10.34028/iajit/20/6/11

Abstract

With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment

Full Text