Abstract
Hadoop Distributed File System (HDFS) is designed for reliable storage and management of very large file and low-cost storage capability. As HDFS architecture based on master (NameNode) to handle metadata for multiple slaves (DataNode), NameNode often becomes the bottleneck, especially when handing large number of small files. It is a common solution to merge many small files into one big file about this problem. To solve the large small files problem and improve the efficiency of accessing small files, in this paper, we define Logic File Name (LFN) and propose the Small file Merge Strategy Based LFN (SMSBL). SMSBL is a new idea and a new perspective on hierarchy, it improves the correlation of small files in the same block of HDFS effectively based different file system hierarchy, so the performance is amazing facing large small files when HDFS adopted SMSBL with prefetching mechanism. The system efficiency analysis model is established and experimental results demonstrate that SMSBL can solve small file problem in HDFS and has appreciable high hit rate of prefetching files.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have