Improving the Efficiency of Storing for Small Files in HDFS

Yang Zhang,Dan Liu

doi:10.1109/csss.2012.556

Abstract

HDFS (Hadoop Distributed File System) is the popular file system. But HDFS has inefficient issue with small files. Traditional method has the drawback of high resource consumption and low efficiency performance. In order to resolve this problem, this paper proposes a novel approach for small files process, which works as an engine independent with the HDFS. This engine can reduce the overhead of HDFS effectively. It uses Reactor multiplexed IO to build the server and uses non-blocking IO to merge and read small files. And the engine has a cache of small files that can make the reading efficiently. This paper presents the small files processing strategy for files efficient merger, which builds the file index and uses boundary file block filling mechanism to accomplish files separation and files retrieval. At last the experimental results show that the novel approach has improved the efficiency of storing and processing massive small files in HDFS.

Full Text