Abstract

HDFS (Hadoop Distributed File System) is the popular file system. But HDFS has inefficient issue with small files. Traditional method has the drawback of high resource consumption and low efficiency performance. In order to resolve this problem, this paper proposes a novel approach for small files process, which works as an engine independent with the HDFS. This engine can reduce the overhead of HDFS effectively. It uses Reactor multiplexed IO to build the server and uses non-blocking IO to merge and read small files. And the engine has a cache of small files that can make the reading efficiently. This paper presents the small files processing strategy for files efficient merger, which builds the file index and uses boundary file block filling mechanism to accomplish files separation and files retrieval. At last the experimental results show that the novel approach has improved the efficiency of storing and processing massive small files in HDFS.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.