Abstract

Due to the explosive growth of storage demands, distributed storage systems need to support storage scaling efficiently. Recent work optimizes scaling in a decentralized manner for Reed-solomon coded storage systems. In this paper, we focus on storage scaling for storage systems with regenerating codes and design two efficient scaling algorithms for minimum bandwidth regenerating (MBR) and minimum storage regenerating (MSR) codes. We integrate these two scaling algorithms into Hadoop Distributed File System (HDFS), and the experiments on Amazon EC2 show that the scaling bandwidth can be reduced up to 75% and 43.8% over the centralized scaling.

Highlights

  • Many distributed storage systems adopt erasure coding (e.g., Reed-solomon codes [22]) against node failures with low redundancy

  • We focus on two kinds of deterministic regenerating codes: E-minimum bandwidth regenerating (MBR) codes [20] which have a generalized construction scheme of MBR codes, and Butterfly codes [18] which are practical minimum storage regenerating (MSR) codes deployed in real storage systems (e.g., Hadoop Distributed File System (HDFS) and ceph)

  • Proof: In Case 1 and 2, we find that the scaling bandwidth of EMBRScale (see Equation (6), (7)) are equal to the lower bound of the scaling bandwidth for E-MBR codes (see Equation (4))

Read more

Summary

Introduction

Many distributed storage systems adopt erasure coding (e.g., Reed-solomon codes [22]) against node (or server) failures with low redundancy. Erasure coding can significantly achieve higher reliability than replication at the same storage overhead [27], and has been widely adopted in distributed storage systems [8], [15], [23] and practical cloud storage systems, e.g., Azure [15] and Facebook [17]. These storage systems often need to add new storage nodes to increase both storage space and service bandwidth for accommodating the increasing storage demands. Five data blocks (i.e., 1, . . . , 5), one parity block P (generated by XOR-summing of the five data blocks), and their duplicates, are distributed with the complete graph

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.