Abstract

Hadoop is a Java software framework that supports data - intensive distributed applications and is developed under open source license. It enables applications to work with thousands of nodes and petabytes of data. The two major pieces of Hadoop are HDFS and MapReduce. HDFS works with two types of hardware machines, the DataNode (Slave machine) which is the machine on which application's data is stored and the NameNode (Master machine) which store the metadata of file system. Where NameNode is the only single machine for storing metadata of file system and is the Single Point of Failure (SPOF) for the HDFS. SPOF of NameNode machine affects the overall availability of Hadoop. When NameNode goes down the entire system become offine and cannot do any operation until NameNode gets restart. If the NameNode machine fails, the system needs to be re-started manually, making the system less available. This paper proposes a highly available architecture and its working principle for the HDFS NameNode against its SPOF utilizing well-known 2-Phase Commit (2PC) Protocol and election by bully with Time synchronization mechanism. Keywords - Hadoop, HDFS, Two Phase Commit Protocol, Berkeley algorithm for time synchronization, Name node.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call