A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size

Md Kamal Uddin,Mehedi Hasan,Mohammad Badrulalammiah

doi:10.5120/19515-1135

Abstract

is a powerful distributed processing model for large datasets. Hadoop is an open source framework and implementation of MapReduce. Hadoop distributed file system (HDFS) has become very popular to build large scale and high performance distributed data processing system. HDFS is designed mainly to handle big size files, so the processing of massive small files is a challenge in native HDFS. This paper focuses on introducing an approach to optimize the performance of processing of massive small files on HDFS. We design a new HDFS structure model which main idea is to merge the small files and write the small files at source direct into merged file. Experimental results show that the proposed scheme can improve the storage and access efficiencies of massive small files effectively on HDFS. KeywordsMapReduce, HDFS, Big data, Cluster

Full Text