New Solution for Small File Storage of Hadoop Based on Prefetch Mechanism

Hui Xiang Zhou,Qiao Yan Wen

doi:10.4028/www.scientific.net/amr.981.205

Abstract

Hadoop performance a significant advantage in dealing with large files, but it is ineffective if we use Hadoop to handle a large number of small files, because the physical address of the Hadoop file is stored in a single Namenode. Suppose that the size of a small file is 100Byte, if there are such a large number of these small files, it may lead to greatly reduce the utilization of Namenode memory, and due to the large number of small files make the index directory increase, it also lower the rate of user accessing to files. To solve the problem described above, this paper propose a new solution for small file storage of Hadoop based on prefetch mechanism, experiment shows that this solution can effectively improve the memory utilization of Namenode and significantly improve the speed of user accessing.

Full Text