Abstract

As distributed storage clusters have been used more and more widely in recent years, data replication management, which is the key to data availability, has become a hot research topic. In storage clusters, internal network bandwidth is usually a scarce resource. Misplaced replicas may take up too much network bandwidth and greatly deteriorate the overall performance of the cluster. Aiming to reduce the internal network traffic and to improve load balancing of distributed storage clusters, we developed a centralized replication management scheme referred to as CRMS. A model is proposed to capture the relationships of block access probability, replica location and network traffic. Based on this model, the replica placement problem is formulated as a 0–1 programming optimization problem. Based on the feasible solution to this problem, a heuristic is proposed to process the replica adjustments step by step. Our CRMS is evaluated by using the access history from a distributed storage cluster of Xunlei Inc., one of the leading Internet companies in China. The experimental results show that CRMS can greatly reduce the amount of internal network bandwidth consumption, while keeping the cluster's storage usage in balance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.