Abstract

Storage for large datasets, handling data in different formats and data getting generated with high speed are the major highlights of the Hadoop because of which the Hadoop got invented. Hadoop is the solution for the big data problems as discussed above. In order to give the improved solution (in terms of access efficiency and time) for small sized files, this solution is proposed. A novel approach called VFS-HDFS architecture is designed in which the focus is on optimization of small sized files access problems with significant development compared with the existing solutions i.e. HDFS sequence files, HAR, NHAR. In the proposed work a Virtual file system layer has been added as a wrapper over the top of existing HDFS architecture. However, the research work is carried out without altering the existing HFDS architecture. In this paper drawbacks of existing techniques i.e. Flat File Technique and Table Chain Technique which are implemented in HDFS HAR, NHAR, sequence file is overcome by using Bucket Chain Technique. The files to merge in a single bucket are selected using ensemble classifier which is a combination of different classifiers. Combination of multiple classifiers gives the better accurate results. Using this proposed system, better results are obtained compared with the existing system in terms of access efficiency of small sized files in HDFS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call