Abstract

Hadoop is currently the most popular big data processing architecture, which provides a processing framework for managing and analyzing massive data. Hadoop Distributed File System (HDFS) is the core component for storing data in the Hadoop systems. It performs well when storing and managing large-sized files. However, it shows extremely bad performance when dealing with a large number of small files, which is reflected in the consumption of a large amount of memory space of the NameNode node and the low efficiency of accessing small files. Archive file is proposed to solve this problem. It combines small files into larger merged files for storage, and uses index files to record the relevant information of small files in the merged files. However, archive files (e.g., Hadoop Archive, MapFile) all require additional processing and multiple I/O operations to obtain index information before actually accessing the file content, which reduces the efficiency of file access. This paper proposes a novel distributed file storing mechanism called B+ Tree-Based Distributed File Storing mechanism (BTDFS), which combines a large number of small files into a merged file, records the index information of the small files in the merged file, and organizes the index information into a B+ tree for storage. Using BTDFS, while reducing the metadata consumption of the NameNode node, improves the access performance of a single small file with additional abilities, which provides convenience for analyzing a large number of small files. Finally, this paper designs corresponding experiments for the proposed BTDFS, and tests and analyzes the memory usage in N ameN ode and the access performance of small files. The experimental results show that the proposed BTDFS can effectively reduce the memory usage of the NameNode, and the efficiency of accessing small files is improved compared with state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call