HADOOP: A Comparative Study between Single-Node and Multi-Node Cluster

Elisabeta Zagan,Mirela Danubianu

doi:10.14569/ijacsa.2021.0120207

Elisabeta Zagan, Mirela Danubianu

Open Access

https://doi.org/10.14569/ijacsa.2021.0120207

Copy DOI

Abstract

Data analysis has become a challenge in recent years as the volume of data generated has become difficult to manage, therefore more hardware and software resources are needed to store and process this huge amount of data. Apache Hadoop is a free framework, widely used thanks to the Hadoop Distributed Files System (HDFS) and its ability to relate to other data processing and analysis components such as MapReduce for processing data, Spark - in-memory Data Processing, Apache Drill - SQL on Hadoop, and many other. In this paper, we analyze the Hadoop framework implementation making a comparative study between Single-node and Multi-node cluster on Hadoop. We will explain in detail the two layers at the base of the Hadoop architecture: HDFS Layer with its deamons NameNode, Secondary NameNode, DataNodes and MapReuce Layer with JobTrackers, TaskTrackers daemons. This work is part of a complex one aiming to perform data processing in Data Lake structures.

Highlights

Before the term Big Data, appeared about 15 years ago, there were few possibilities to process terabytes of data sets or higher
After installing, configuring, and starting ssh processes of a Singlenode cluster, launching the jps command, which is a java virtual machine process status tool, we can see the status of all Hadoop daemons like NameNode, Secondary NameNode, JobTracker, TaskTracker, DataNodes that are currently running on the machine
All Hadoop daemons NameNode, DataNode, Secondary NameNode, JobTracker, TaskTracker runs on one single machine

Summary

INTRODUCTION

Before the term Big Data, appeared about 15 years ago, there were few possibilities to process terabytes of data sets or higher. Doug Cutting had begun working on a new open-source implementation based on the ideas suggested by Google, so Hadoop was born. Hadoop is a distributed processing software framework that can process both small and large volumes of data across clusters of computers. It is recommended for large data sets, because it is able to scale-up from a single server to hundreds. Data are read in parallel and the time required for this operation is substantially reduced Another important specific feature of Hadoop is that it is based on the "write once and read many times" technology.

RELATED WORKS

HADOOP ARCHITECTURE

HADOOP

Single-Node Cluster

Multi-Node Cluster

CONCLUSIONS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2021
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

HADOOP: A Comparative Study between Single-Node and Multi-Node Cluster

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

SOFTWARE AND HARDWARE AUTOMATED SYSTEM OF CASTS DEFECTS NON-DESTRUCTIVE MONITORING
S V Knyazev ... A A Usol’Tsev
Izvestiya. Ferrous Metallurgy | VOL. 62
S V Knyazev, et. al.S V Knyazev ... A A Usol’Tsev
30 Mar 2019
Izvestiya. Ferrous Metallurgy | VOL. 62

Parallel and Distributed Computing for Processing Big Image and Video Data
Praveen Kumar ... Harshada Kumbhare
-
Praveen Kumar, et. al.Praveen Kumar ... Harshada Kumbhare
01 Jan 2019
01 Jan 2019

Big data analysis architecture for multi IDS sensors using memory based processor
Ferry Astika Saputra ... Muhammad Salman
-
Ferry Astika Saputra, et. al.Ferry Astika Saputra ... Muhammad Salman
01 Sep 2017
01 Sep 2017

A Survey on Spark Ecosystem: Big Data Processing Infrastructure, Machine Learning, and Applications
Shanjiang Tang ... Kun Li
IEEE Transactions on Knowledge and Data Engineering | VOL. -
Shanjiang Tang, et. al.Shanjiang Tang ... Kun Li
01 Jan 2020
IEEE Transactions on Knowledge and Data Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HADOOP: A Comparative Study between Single-Node and Multi-Node Cluster

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications