AHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters

Yuanqi Chen,Jianzhong Huang,Shubbhi Taneja,Xiao Qin,Yi Zhou

doi:10.1109/tpds.2017.2706686

Abstract

In this paper, we propose an erasure-coded data archival system called aHDFS for Hadoop clusters, where $RS(k+r,k)$ codes are employed to archive data replicas in the Hadoop distributed file system or HDFS. We develop two archival strategies (i.e., aHDFS-Grouping and aHDFS-Pipeline ) in aHDFS to speed up the data archival process. aHDFS-Grouping - a MapReduce-based data archiving scheme - keeps each mapper’s intermediate output Key-Value pairs in a local key-value store. With the local store in place, aHDFS-Grouping merges all the intermediate key-value pairs with the same key into one single key-value pair, followed by shuffling the single Key-Value pair to reducers to generate final parity blocks. aHDFS-Pipeline forms a data archival pipeline using multiple data node in a Hadoop cluster. aHDFS-Pipeline delivers the merged single key-value pair to a subsequent node’s local key-value store. Last node in the pipeline is responsible for outputting parity blocks. We implement aHDFS in a real-world Hadoop cluster. The experimental results show that aHDFS-Grouping and aHDFS-Pipeline speed up Baseline ’s shuffle and reduce phases by a factor of 10 and 5, respectively. When block size is larger than 32 MB, aHDFS improves the performance of HDFS-RAID and HDFS-EC by approximately 31.8 and 15.7 percent, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Nov 1, 2017
Citations: 42	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

AHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Similar Papers

Enhancing Availability and Reliability of Cloud Data through Syncopy
Tsozen Yeh ... Huichen Lee
-
Tsozen Yeh, et. al.Tsozen Yeh ... Huichen Lee
01 Sep 2014
01 Sep 2014

Blockchain Enabled Hadoop Distributed File System Framework for Secure and Reliable Traceability
Manish Kumar Gupta ... Rajendra Kumar Dwivedi
ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal | VOL. 12
Manish Kumar Gupta, et. al.Manish Kumar Gupta ... Rajendra Kumar Dwivedi
29 Dec 2023
ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal | VOL. 12

Customized Web User Interface for Hadoop Distributed File System
T Lakshmi Siva Rama Krishna ... T Ragunathan
-
T Lakshmi Siva Rama Krishna, et. al.T Lakshmi Siva Rama Krishna ... T Ragunathan
04 Sep 2015
04 Sep 2015

Evaluation and Analysis of GreenHDFS: A Self-Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System
Rini T Kaushik ... Klara Nahrstedt
-
Rini T Kaushik, et. al.Rini T Kaushik ... Klara Nahrstedt
01 Nov 2010
01 Nov 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems