Abstract

Distributed file systems, which enable users to manage large amounts of data over multiple commodity computers, have attracted attention as a potential management and processing system for big data applications. The Hadoop Distributed File System (HDFS) is a widely used open source distributed file system. In the HDFS, multiple replicas are separately stored over the multiple data nodes for enhanced availability. When a data node failure is detected, replica reconstruction is performed. During this process, the access load of the other data nodes, which hold the lost data blocks, may increase, so that the overall performance of data processing over the distributed file system decreases. Therefore, an important issue is effective replica reconstruction in order to prevent such performance degradation. In addition, HDFS composed of multiple racks is needed to replicate the missing blocks on a different rack according to the HDFS replica placement policy, for the purpose of availability. We have to take into account network bandwidth and fault tolerance for such blocks which require data transfer between racks in the cluster. In this paper, we propose replica reconstruction schemes for a multi-rack HDFS cluster and evaluate the effectiveness of our proposed schemes in multi-rack cluster environments by simulation. In the proposed schemes, data transfer in a rack is performed based on a one-directional ring structure and inter-rack data transfer is performed in a round robin manner. We control streams between racks as giving the priority for the blocks which requires inter-rack transfer. The experiments show that the proposed schemes are effective for reduction of the execution time and improvement of the fault tolerance. We also confirm that the performance shows further improvement by controlling the number of streams between racks properly and the execution times of our proposed schemes show a 16% reduction in time required compared to that of the default scheme.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call