Unequal Failure Protection Coding Technique for Distributed Cloud Storage Systems

Yupeng Hu,Yonghe Liu,Wenjia Li,Nong Xiao,Kenli Li,Keqin Li,Zheng Qin

doi:10.1109/tcc.2017.2785396

Abstract

In recent years, erasure codes have become the de facto standard for data protection in large scale distributed cloud storage systems at the cost of an affordable storage overhead. However, traditional erasure coding schemes, such as Reed-Solomon codes, suffer from high reconstruction cost and I/Os. The recent past has seen a plethora of efforts to optimize the tradeoff between the reconstruction cost, I/Os and storage overhead. Quiet different from all prior studies, in this paper, our erasure coding technique makes the first attempt to take advantage of the unequal failure rates across the disks/nodes to optimize the system reliability and reconstruction performance. Specifically, our proposed technique, the Unequal Failure Protection based Local Reconstruction Code (UFP-LRC) divides the data blocks into several unequal-sized groups with local parities, assigning the data blocks stored on more failure-prone disks/nodes into the smaller-sized group, so as to provide unequal failure protection for each group. In this way, by exploiting the nonuniform local parity degrees, the proposed UFP-LRC enables the data blocks that are stored on more failure-prone disks/nodes to tolerate a greater number of failures while suffering from less repair cost than others, leading to a substantial improvement of the overall reliability and repair performance for cloud storage systems. We perform numerical analysis and build a prototype storage system to verify our approach. The analytical results show that the UFP-LRC technique gradually outperforms LRC along the increase of failure rate ratio. Also, extensive experiments show that, when compared to LRC, UFP-LRC is able to achieve a 10 to 15 percent improvement in throughput, and an 8 to 12 percent reduction in decoding latency, while retaining a comparable overall reliability.

Full Text