A Novel Approach for Improved Data Replication Using HDFS

T Prasuna,K Gowri Pravallika,D Chakradhar Babu,V Sindhura

doi:10.1109/rteict42901.2018.9012371

Abstract

HDFS (Hadoop Distributed File System) is intended to store huge dataset values with accurate file location with high reliability and data streaming to the application is done at high bandwidth. HDFS deals with high fault tolerance with the use of replication of data. Many researches have been done on fitting the data in the exact location. The problem occurred in Hadoop distributed file system is difficulty with information space for storage. It is the most complicated problem which reduce the performance of the file system. To overcome this issue the proposed system aims to have the better data replication which depends upon the access count estimation in Hadoop framework. The proposed system creates the better replicas and to solve the data locality problem with improved arrangement of data replicas and to assign the task for efficient workers to complete the Map Reduce task to obtain the better results. By comparison with the existing system, the proposed system performs the better replication and solves the data locality problem. An experiment has been performed for evaluating the proposed technique with default technique and previously used replication techniques using a benchmark. With respect to the results obtained the proposed method obtained a better throughput when compared to the previous techniques.

Full Text