A Critical Analysis of Apache Hadoop and Spark for Big Data Processing

Piyush Sewal,Hari Singh

doi:10.1109/ispcc53510.2021.9609518

Abstract

The emergence of big data processing platforms that can work globally in an integrated manner and process the huge datasets efficiently has become very significant. A critical analysis of two big data processing platforms, Apache Hadoop MapReduce and Apache Spark, has been done in this paper. Earlier Hadoop MapReduce was one of the most popular platforms for batch-processing of huge size datasets but variation in the nature of data from static to dynamic, Apache Spark proves to be better for iterative jobs and live data streams. This paper aims to critically compare and analyze Hadoop-l.x, 2. x and 3. x, Spark-l.x, 2. x and 3. x on well-known key parameters like components, storage system, resource management, fault tolerance, data processing, scalability and performance etc.

Full Text