A novel approach to improve the performance of Hadoop in handling of small files

Parth Gohil,J S Dhobi,Bakul Panchal

doi:10.1109/icecct.2015.7226044

Parth Gohil, J S Dhobi + Show 1 more

https://doi.org/10.1109/icecct.2015.7226044

Copy DOI

Export

Save

Cite

Publication Date: Mar 1, 2015

Citations: 20

Abstract
Full-Text
Similar Papers

Abstract

Listen

Hadoop, an open source java framework deals with big data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another is MapReduce which is a programming model which processes the data in parallel and distributed manner. Hadoop does not perform well for small files as a large number of small files pose a heavy burden on the NameNode of HDFS and an increase in execution time for MapReduce is encountered. Hadoop is designed to handle huge size files and hence suffers a performance penalty while dealing with large number of small files. This research work gives an introduction about HDFS, small file problem and existing ways to deal with it these problems along with proposed approach to handle small files. In proposed approach, merging of small file is done using MapReduce programming model on Hadoop. This approach improves the performance of Hadoop in handling of small files by ignoring the files whose size is larger than the block size of Hadoop and also reduces the memory required by NameNode to store them.

Full Text