Abstract

The failure of a single unreliable commodity component is very common in large-scale distributed storage systems. In order to ensure the reliability of data in large-scale distributed storage systems, many studies have emerged one after another. Among them, Erasure Codes are widely used in actual storage systems, such as Hadoop Distributed File System (HDFS), which provides high fault-tolerance with lower storage overhead. However, usually the recovery of erasure-coded storage system when encountering node failure results in severe cross-node and cross-rack bandwidth consumption, which affects the efficiency of failure recovery and wastes additional resources. In this paper, we improve the erasure coding storage strategy in distributed storage systems, and propose a low-overhead data recovery method based on cross-checking, namely HV-SNSP. In HV-SNSP, horizontal and vertical cross parity checking is realized by adding RS parity inside the data node, that is, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\text{H}^{\mathrm {RS(n, k)}}$ </tex-math></inline-formula> - <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\text{V}^{\mathrm {RS(n', \textrm {}k')}}$ </tex-math></inline-formula> storage architecture. Based on <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\text{H}^{\mathrm {RS(n, \textrm {}k)}}$ </tex-math></inline-formula> - <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\text{V}^{\mathrm {RS(n', k')}}$ </tex-math></inline-formula> , a low-cost supply node selection strategy, namely SNSP, is designed, and nodes with shorter network distance and lower load are selected to participate in recovery. This strategy can effectively reduce the amount of data transmission, shorten the recovery time, and improve the recovery efficiency. The experimental results show that compared with traditional RS, HV-SNSP can reduce the amount of cross-rack data transmission by 62.5% during data recovery, and can shorten the recovery time by up to 42.41%; Compared with D3, HV-SNSP can reduce the occupation of cross-rack bandwidth by 25% and shorten the recovery time by 36.58%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call