A Study of Effective Replica Reconstruction Schemes for the Hadoop Distributed File System

Asami Higai,Atsuko Takefusa,Hidemoto Nakada,Masato Oguchi

doi:10.1587/transinf.2014edp7242

Abstract

Distributed file systems, which manage large amounts of data over multiple commercially available machines, have attracted attention as management and processing systems for Big Data applications. A distributed file system consists of multiple data nodes and provides reliability and availability by holding multiple replicas of data. Due to system failure or maintenance, a data node may be removed from the system, and the data blocks held by the removed data node are lost. If data blocks are missing, the access load of the other data nodes that hold the lost data blocks increases, and as a result, the performance of data processing over the distributed file system decreases. Therefore, replica reconstruction is an important issue to reallocate the missing data blocks to prevent such performance degradation. The Hadoop Distributed File System (HDFS) is a widely used distributed file system. In the HDFS replica reconstruction process, source and destination data nodes for replication are selected randomly. We find that this replica reconstruction scheme is inefficient because data transfer is biased. Therefore, we propose two more effective replica reconstruction schemes that aim to balance the workloads of replication processes. Our proposed replication scheduling strategy assumes that nodes are arranged in a ring, and data blocks are transferred based on this one-directional ring structure to minimize the difference in the amount of transfer data for each node. Based on this strategy, we propose two replica reconstruction schemes: an optimization scheme and a heuristic scheme. We have implemented the proposed schemes in HDFS and evaluate them on an actual HDFS cluster. We also conduct experiments on a large-scale environment by simulation. From the experiments in the actual environment, we confirm that the replica reconstruction throughputs of the proposed schemes show a 45% improvement compared to the HDFS default scheme. We also verify that the heuristic scheme is effective because it shows performance comparable to the optimization scheme. Furthermore, the experimental results on the large-scale simulation environment show that while the optimization scheme is unrealistic because a long time is required to find the optimal solution, the heuristic scheme is very efficient because it can be scalable, and that scheme improved replica reconstruction throughput by up to 25% compared to the default scheme.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEICE Transactions on Information and Systems	Publication Date: Jan 1, 2015
Citations: 4	License type: free

R Discovery Prime

R Discovery Prime

A Study of Effective Replica Reconstruction Schemes for the Hadoop Distributed File System

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems

Lead the way for us

Similar Papers

A Study of Effective Replica Reconstruction Schemes at Node Deletion for HDFS
Asami Higai ... Hidemoto Nakada
-
Asami Higai, et. al.Asami Higai ... Hidemoto Nakada
01 May 2014
01 May 2014

A Study of Replica Reconstruction Schemes for Multi-rack HDFS Clusters
Asami Higai ... Atsuko Takefusa
-
Asami Higai, et. al.Asami Higai ... Atsuko Takefusa
01 Dec 2014
01 Dec 2014

Locality Sensitive Hashing based incremental clustering for creating affinity groups in Hadoop — HDFS - An infrastructure extension
A Kala Karun ... K Chitharanjan
-
A Kala Karun, et. al.A Kala Karun ... K Chitharanjan
01 Mar 2013
01 Mar 2013

ERP: An enhanced read policy for HDFS to improve read performance for files under construction
Junjie He ... Fei Hu
-
Junjie He, et. al. Junjie He ... Fei Hu
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Study of Effective Replica Reconstruction Schemes for the Hadoop Distributed File System

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems