New Design and Analysis of Error-Resilient LRCs for DSSs With Silent Disk Errors

Chanki Kim,Jong-Seon No

doi:10.1109/access.2021.3107838

Abstract

Recently, erasure coding techniques are considered as essential schemes for the reliability of the modern distributed storage systems (DSSs) with the frequent node-level failure. Especially, locally repairable codes (LRCs) are widely adopted by the practical advantage of reducing the latency for repair process. However, recent researches show that many cases for system failure are also originated from the silent disk errors. For the conventional LRCs with low error correction capability, repair process from erasure coding can propagate the silent errors and thus, the DSSs become more vulnerable compared to the cases only with node failure. Therefore, we propose a mean time to data loss (MTTDL) from the modified Markov chain model in order to evaluate effects by silent disk errors. Also, new design of binary error-resilient locally repairable codes (ER-LRCs) with high error and erasure correction capabilities are proposed, which have larger values of bit-wise minimum Hamming distance than the existing LRCs. Here, ER-LRCs can be constructed by modifying the parity check matrix from well-known optimal binary and nonbinary LRCs. From the numerical analysis using the proposed Markov model with empirical parameters, it is shown that the proposed ER-LRCs have better MTTDL values when compared to the existing LRCs.

Highlights

R ESILIENCE for cluster-level distributed storage systems (DSSs) has been one of the significant issues in the stable operation of cloud data centers and high-performance computing (HPC) centers
To improve the reliability of disk failure using maximum distance separable (MDS) array erasure codes, SD and maximally recoverable (MR) erasure codes were designed to optimize the cases with simultaneous failures by two granularities both in the node and disk [2]–[10]
We firstly propose a bit-wise minimum Hamming distance db, which is a new code parameter closely related to error correction capability for silent data corruption (SDC)

Summary

INTRODUCTION

R ESILIENCE for cluster-level distributed storage systems (DSSs) has been one of the significant issues in the stable operation of cloud data centers and high-performance computing (HPC) centers. In order to improve the resilience of DSSs against SDCs, modern DSSs use a method to check data consistency periodically, called disk scrubbing, by finding the checksum on the RAID or the erasure code for all disks. Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS both of which eventually increase the total cost of ownership (TCO) of the service providers To this end, evaluating reliability while considering the system failure in large-scale DSSs is necessary, exact analysis or simulation is hard to estimate [16].

PRELIMINARY

SYSTEM MODEL OF ERASURE-CODED DSS WITH

THE EXISTING LRC AND PROPOSED ER-LRC

RELIABILITY METRIC

RELIABILITY ANALYSIS OF THE MODIFIED LRCS

ANALYSIS OF MTTDL BEHAVIOR BY PARAMETERS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

New Design and Analysis of Error-Resilient LRCs for DSSs With Silent Disk Errors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Journal: IEEE Access	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

An erasure code with reduced average locality for distributed storage systems
Majid Khabbazian ... Mostafa Shahabinejad
-
Majid Khabbazian, et. al.Majid Khabbazian ... Mostafa Shahabinejad
01 Jan 2017
01 Jan 2017

Constructions of Optimal $(r,\delta)$ Locally Repairable Codes via Constacyclic Codes
Shu-Tao Xia ... Weijun Fang
IEEE Transactions on Communications | VOL. 67
Shu-Tao Xia, et. al.Shu-Tao Xia ... Weijun Fang
01 Aug 2019
IEEE Transactions on Communications | VOL. 67

Binary Locally Repairable Codes With Large Availability and its Application to Private Information Retrieval
Lingfei Jin ... Haibin Kan
IEEE Transactions on Information Theory | VOL. 68
Lingfei Jin, et. al.Lingfei Jin ... Haibin Kan
01 Apr 2022
IEEE Transactions on Information Theory | VOL. 68

On the Single-Parity Locally Repairable Codes with Multiple Repairable Groups
Yanbo Lu ... Shutao Xia
Information | VOL. 9
Yanbo Lu, et. al.Yanbo Lu ... Shutao Xia
24 Oct 2018
Information | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

New Design and Analysis of Error-Resilient LRCs for DSSs With Silent Disk Errors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access