Abstract

With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.