On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS

Dingchao Chen,Wei Shen,Yu Zhang,Chase Q Wu

doi:10.1109/aiccsa53542.2021.9686873

Abstract

Hadoop Distributed File System (HDFS) was originally designed to store big files and has been widely used in big-data ecosystem. However, it may suffer from serious performance issues when handling a large number of small files. In this paper, we propose a novel archive system, referred to as Small File Merger (SFM), to solve small file problems in HDFS. The key idea is to combine small files into large ones and build an index for accessing original files. Unlike traditional archive systems such as Hadoop Archives (Har), SFM allows modification of archived files directly without re-archiving. Considering that most of the reads in HDFS are sequential, we design an adaptive readahead strategy based on the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm to maximize read performance. Furthermore, our system provides an HDFS-compatible interface, which can be used directly without recompiling and redeploying the existing HDFS cluster, hence facilitating convenient deployment for practical use. Preliminary experimental results show that our system achieves better performance than existing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A novel indexing scheme for efficient handling of small files in Hadoop Distributed File System
S Chandrasekar ... R Dakshinamurthy
-
S Chandrasekar, et. al.S Chandrasekar ... R Dakshinamurthy
01 Jan 2013
01 Jan 2013

Dealing with Small Files Problem in Hadoop Distributed File System
Sachin Bende ... Rajashree Shedge
Procedia Computer Science | VOL. 79
Sachin Bende, et. al.Sachin Bende ... Rajashree Shedge
01 Jan 2015
Procedia Computer Science | VOL. 79

Improving performance of small-file accessing in Hadoop
Chatuporn Vorapongkitipun ... Natawut Nupairoj
-
Chatuporn Vorapongkitipun, et. al.Chatuporn Vorapongkitipun ... Natawut Nupairoj
01 May 2014
01 May 2014

An optimized approach for storing and accessing small files on cloud storage
Bo Dong ... Rachid Anane
Journal of Network and Computer Applications | VOL. 35
Bo Dong, et. al.Bo Dong ... Rachid Anane
24 Jul 2012
Journal of Network and Computer Applications | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS

Abstract

Talk to us

Similar Papers