Abstract

Big data storage demands larger cluster. The increasing size of the cluster may lead to failures of larger number of nodes. To provide reliable big data storage, replications are not cost-effective and does not provide a robust solution to prevent data loss. Traditional erasure codes applied in the RAID system such as Reed-Solomon (R-S) based solutions have limitations in providing high reliability. This is because higher reliability requires erasure codes with larger size. The computational cost of the R-S codes increase quadratically with the number of failures the R-S codes can tolerate for the same redundancy rate. It has been shown that Low Density Parity Check (LDPC) codes have lower computational cost and repair network traffic compared with R-S based solutions. Unfortunately, there does not exist a construction method for LDPC codes with larger size to control the computational cost and repair traffic. In this paper, a novel method is proposed to construct a family of LDPC codes - expanCodes with expandable sizes. The proposed expanCodes allows the encoding and decoding complexity remain unchanged with the increase of the size of the LDPC codes. As a result, increased reliability can be achieved without additional computation and repair traffic. The proposed expanCodes is integrated with the Hadoop system. Simulations show that more than 29% decrease in terms of encoding and decoding latency compared with R-S based solutions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call