Abstract

Erasure codes, such as Reed–Solomon (RS) codes and local reconstruction codes (LRCs), are being increasingly adopted in distributed storage systems since they offer lower redundancy than data replication. While these codes significantly save storage space, they can incur large I/O overhead and network traffic in reconstructing unavailable data. Most existing storage systems use replication for hot data and an erasure code for warm and cold data, thereby achieving a good tradeoff between storage overhead and recovery performance. However, these storage systems do not take the access characteristics of data into account and tend to use only an erasure code, which hinders the possibility of reducing storage overhead and recovery cost. In this paper, we propose a new adaptive coding selection method that instead uses multiple LRCs for warm data. The LRCs are selected based on the access characteristics of the data. Each time a file is accessed, we assume that each of the involved data blocks is unavailable, in turn. It is necessary to calculate the I/O cost to recover unavailable blocks for different LRCs. The sum of the I/O costs for each LRC is calculated, and the LRC with the minimal I/O cost is selected for warm data. For cold data, we use an RS code that is optimized for storage overhead to reduce the storage burden. Our method is implemented on the top of the Hadoop distributed file system. Evaluations show that it reduces the storage overhead by up to 5% and the reconstruction traffic by up to 22%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.