Abstract

Hadoop is a distributed batch processing infrastructure which is currently being used for big data management. The foundation of Hadoop consists of Hadoop Distributed File System (HDFS). HDFS presents a client-server architecture comprised of a Name Node and many Data Nodes. The Name Node stores the metadata for the Data Nodes and Data Node stores application data. The Name Node holds file system metadata in memory, and thus the limit to the number of files in a file system is governed by the amount of memory on the Name Node. Thus when the memory on Name Node is full there is no further chance of increasing the cluster capacity. In this paper we have used the concept of cache memory for handling the issue of Name Node scalability. The focus of this paper is to highlight our approach that tries to enhance the current architecture and ensure that Name Node does not reach its threshold value soon.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call