Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study

Abdiaziz Omar Hassan,Abdulkadir Abdulahi Hasan

doi:10.11648/j.aas.20210603.11

Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan

Open Access

https://doi.org/10.11648/j.aas.20210603.11

Copy DOI

Journal: Advances in Applied Sciences	Publication Date: Jan 1, 2021
Citations: 3	License type: cc-by

Affiliation: Anhui University

Abstract

With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast.

Highlights

With the introduction and advancement of technology and computerized innovation, the growth of data is unimaginable and unreachable
Discussing results and discussions, big data and its requisite technologies can bring about significant changes and benefits to your business
To handle the growth of individual companies, certain aspects should be followed so that timely results could be attained from Big Data since effective use of Big Data, the modernization, and effectiveness for entire divisions and economies are to be attained

Summary

Introduction

With the introduction and advancement of technology and computerized innovation, the growth of data is unimaginable and unreachable. Data scientists and handlers are getting overwhelmed and frustrated with such a large and everincreasing amount of data with its processing requirements ever-increasing and demanding more every time. With so large an ever-increasing data, there comes to some problems as well concerning its handling, processing, and management. These problems are faced by various fields in making use of this large scale, drawing meanings out of it, as well as, using it for decision making. Various hurdles in the wake of processing are faced by large-scale internet companies including Google, Yahoo, Facebook, LinkedIn, as well as, other bigger internet-solution providing companies that require processing a huge chunk of data in minimum timeframe and keeping the cost-effective solution in an application

Methods

Results

Conclusion