Abstract
Distributed file systems, which manage large amounts of data over multiple commercially available machines, have attracted attention as a management and processing system for big data applications. A distributed file system consists of multiple data nodes and provides reliability and availability by holding multiple replicas of data. Due to system failure or maintenance, a data node may be removed from the system and the data blocks the removed data node held are lost. If data blocks are missing, the access load of the other data nodes that hold the lost data blocks increases, and as a result the performance of data processing over the distributed file system decreases. Therefore, replica reconstruction is an important issue to reallocate the missing data blocks in order to prevent such performance degradation. The Hadoop Distributed File System (HDFS) is a widely used distributed file system. In the HDFS replica reconstruction process, source and destination data nodes for replication are selected randomly. We found that this replica reconstruction scheme is inefficient because data transfer is biased. Therefore, we propose two more effective replica reconstruction schemes that aim to balance the workloads of replication processes. Our proposed replication scheduling strategy assumes that nodes are arranged in a ring and data blocks are transferred based on this one-directional ring structure to minimize the difference of the amount of transfer data of each node. Based on this strategy, we propose two replica reconstruction schemes, an optimization scheme and a heuristic scheme. We have implemented the proposed schemes in HDFS and evaluated them on an actual HDFS cluster. From the experiments, we confirm that the replica reconstruction throughput of the proposed schemes show a 45% improvement compared to that of the default scheme. We also verify that the heuristic scheme is effective because it shows performance comparable to the optimization scheme and can be more scalable than the optimization scheme.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.