Abstract

The amount of encoded data replication in an erasure-coded clustered storage system has a great impact on the bandwidth consumption and network latency, mostly during data reconstruction. Aimed at the reasons that lead to the excess data transmission between racks, a rack aware data block placement method is proposed. In order to ensure rack-level fault tolerance and reduce the frequency and amount of the cross-rack data transmission during data reconstruction, the method deploys partial data block concentration to store the data blocks of a file in fewer racks. Theoretical analysis and simulation results show that our proposed strategy greatly reduces the frequency and data volume of the cross-rack transmission during data reconstruction. At the same time, it has better performance than the typical random distribution method in terms of network usage and data reconstruction efficiency.

Highlights

  • The explosive growth of data volume brings serious challenges to data storage in the age of big data

  • The distributed storage cluster system is built on a large amount of cheap storage devices in growing quantity, and even a single device failure may affect the entire system

  • Researchers have extensively studied the application of deploying erasure codes in a distribution storage cluster system

Read more

Summary

Introduction

The explosive growth of data volume brings serious challenges to data storage in the age of big data. The distributed storage cluster system is a large-scale distribution storage system, deployed in a clustered way, and is widely used for big data storage due to its low cost. The distributed storage cluster system brings more and more challenges in terms of data reliability. The storage system usually uses replication and erasure codes to ensure data reliability. Erasure Coding in Distribution Storage Cluster System. Researchers have extensively studied the application of deploying erasure codes in a distribution storage cluster system. Many measures were taken to improve the performance of erasure-coded clustered storage system, such as simplifying the decoding operation, increasing the number of replicas, or reducing the number of data reconstruction events, etc.

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call