Abstract

SummaryThe increasing data volume in a large number of applications presents a dire need for supporting the reliable data management in distributed storage systems. Existing classical erasure codes, such as the Reed‐Solomon codes and locally reconstruction codes, are widely adopted by many distributed storage systems. However, existing researches mainly focus on proposing new optimized codes, ignoring the optimization of the encoding process with the classical codes, where inefficient encoding process greatly degrades the encoding performance of the distributed storage systems. Thus, how to complete the encoding process in an efficient way has become the challenge for adopting the classical codes. In this paper, we propose a decentralized redundancy generation scheme on the basis of the codes with locality, called D2CP, where a 2‐step framework is proposed to support both the data patterns (replication to encoding and direct encoding) and codes with locality with any parameter set. For improving the insertion throughput, D2CP adopts a data placement technique with consistent hashing to guide the selection of nodes. For reducing the network traffic cost, D2CP adopts a data sending scheduling technique to schedule the transmission of the source nodes and a cooperative parity generation technique to generate the parity data cooperatively. To evaluate the performance of D2CP, we conduct experiments on our RAID distributed storage system under various parameter settings with both 30 physical and 200 virtual servers. Extensive experiments confirm that D2CP can improve the encoding throughput by 20% and 32% and reduce the network traffic cost by 16% and 33% compared with the typical approaches on average for the 2 data patterns respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call