Improving performance of small-file accessing in Hadoop

Chatuporn Vorapongkitipun,Natawut Nupairoj

doi:10.1109/jcsse.2014.6841867

Abstract

The Hadoop Distributed File System (HDFS) is an open source system which is designed to run on commodity hardware and is suitable for applications that have large data sets (terabytes). As HDFS architecture bases on single master (NameNode) to handle metadata management for multiple slaves (Datanode), NameNode often becomes bottleneck, especially when handling large number of small files. To maximize efficiency, NameNode stores the entire metadata of HDFS in its main memory. With too many small files, NameNode can be running out of memory. In this paper, we propose a mechanism based on Hadoop Archive (HAR), called New Hadoop Archive (NHAR), to improve the memory utilization for metadata and enhance the efficiency of accessing small files in HDFS. In addition, we also extend HAR capabilities to allow additional files to be inserted into the existing archive files. Our experiment results show that our approach can to improve the access efficiencies of small files drastically as it outperforms HAR up to 85.47%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving performance of small-file accessing in Hadoop

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A novel indexing scheme for efficient handling of small files in Hadoop Distributed File System
S Chandrasekar ... R Dakshinamurthy
-
S Chandrasekar, et. al.S Chandrasekar ... R Dakshinamurthy
01 Jan 2013
01 Jan 2013

On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS
Dingchao Chen ... Chase Q Wu
-
Dingchao Chen, et. al.Dingchao Chen ... Chase Q Wu
01 Nov 2021
01 Nov 2021

Hadoop Massive Small File Merging Technology Based on Visiting Hot-Spot and Associated File Optimization
Jian-Feng Peng ... Hui-Min Zhao
-
Jian-Feng Peng, et. al.Jian-Feng Peng ... Hui-Min Zhao
01 Jan 2018
01 Jan 2018

Dealing with Small Files Problem in Hadoop Distributed File System
Sachin Bende ... Rajashree Shedge
Procedia Computer Science | VOL. 79
Sachin Bende, et. al.Sachin Bende ... Rajashree Shedge
01 Jan 2015
Procedia Computer Science | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving performance of small-file accessing in Hadoop

Abstract

Talk to us

Similar Papers