Boosting Erasure-Coded Multi-Stripe Repair in Rack Architecture and Heterogeneous Clusters: Design and Analysis

Hai Zhou,Dan Feng

doi:10.1109/tpds.2023.3282180

Abstract

Large-scale distributed storage systems have introduced erasure codes to guarantee high data reliability, yet inevitably at the expense of high repair costs. In practice, storage nodes are usually divided into different racks, and data blocks in storage nodes are often organized into multiple stripes independently manipulated by erasure code. Due to the scarcity and heterogeneity of the cross-rack bandwidth, the cross-rack network transmission dominates the entire repair costs. We argue that when erasure code is deployed in a rack architecture, existing repair techniques are limited in different aspects: neglecting the heterogeneous cross-rack bandwidth, less consideration for multi-stripe failure, and no special treatment on repair link scheduling. In this paper, we present CMRepair, a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Cross-rack Multi-stripe Repair</i> technique that aims to reduce the repair time for multi-stripes failure repair in heterogeneous erasure-coded clusters. CMRepair first carefully chooses the nodes for reading/repairing blocks and searches for the multi-stripe repair solution. It adopts different algorithms to adjust per stripe solution, including the Computation Time Priority (CTP) algorithm based on the greedy idea and the Repair Time Priority (RTP) algorithm based on the meta-heuristics idea. Furthermore, CMRepair selectively schedules the execution orders of cross-rack links, with the primary objective of saturating the unused upload/download bandwidth resources and avoiding network congestion. The experiments show that CMRepair with the CTP algorithm can reduce 27.59%-58.12% of the repair time while only introducing negligible computation overhead, and CMRepair with the RTP algorithm can reduce 33.52%-97.75% of the repair time in an acceptable computation time, over existing repair techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Boosting Erasure-Coded Multi-Stripe Repair in Rack Architecture and Heterogeneous Clusters: Design and Analysis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Aug 1, 2023
Citations: 3

Similar Papers

Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters
Hai Zhou ... Dan Feng
-
Hai Zhou, et. al.Hai Zhou ... Dan Feng
29 Aug 2022
29 Aug 2022

HV-SNSP: A Low-Overhead Data Recovery Method Based on Cross-Checking
Ying Song ... Tiantong Mu
IEEE Access | VOL. 11
Ying Song, et. al.Ying Song ... Tiantong Mu
01 Jan 2023
IEEE Access | VOL. 11

An Adaptive Data Objects Placement Algorithm for Non-uniform Capacities
Zhong Liu ... Xing-Ming Zhou
-
Zhong Liu, et. al.Zhong Liu ... Xing-Ming Zhou
01 Jan 2004
01 Jan 2004

Reducing network cost of data repair in erasure-coded cross-datacenter storage
Han Bao ... Fangliang Xu
Future Generation Computer Systems | VOL. 102
Han Bao, et. al.Han Bao ... Fangliang Xu
29 Aug 2019
Future Generation Computer Systems | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Boosting Erasure-Coded Multi-Stripe Repair in Rack Architecture and Heterogeneous Clusters: Design and Analysis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems