A COMPREHENSIVE STUDY ON BIG DATA FRAMEWORKS

Nagham A Sultan,Dhuha B Abdullah

doi:10.47832/2717-8234.14.4

Abstract

With the advent of cloud computing technology, the generation of data from various sources has increased during the last few years. The current data processing technology must handle the enormous volumes of newly created data. Therefore, the studies in the literature have concentrated on big data, which has enormous volumes of almost unstructured data. Dealing with such data needs well-designed frameworks that fulfil developers’ needs and fit colourful purposes. Moreover, these frameworks can use for storing, processing, structuring, and analyzing data. The main problem facing cloud computing developers is selecting the most suitable framework for their applications. The literature includes many works on these frameworks. However, there is still a severe gap in providing comprehensive studies on this crucial area of research. Hence, this article presents a novel comprehensive comparison among the most popular frameworks for big data, such as Apache Hadoop, Apache Spark, Apache Flink, Apache Storm, and MongoDB. In addition, the main characteristics of each framework in terms of advantages and drawbacks are also deeply investigated in this article. Our research provides a comprehensive analysis of various metrics related to data processing, including data flow, computational model, overall performance, fault tolerance, scalability, interval processing, language support, latency, and processing speed. To our knowledge, no previous research has conducted a detailed study of all these characteristics simultaneously. Therefore, our study contributes significantly to the understanding of the factors that impact data processing and provides valuable insights for practitioners and researchers in the field

Full Text