Abstract

To support large-scale data-intensive applications, massive distributed storage platform are being widely deployed in more and more IT-infrastructures. One of the most mentioned issues on distributed storage platform is how to maintain desirable data availability without too many extra costs. Therefore, data replication service plays a key role to achieve this goal. Unfortunately, many existing replication policies are designed for small-scale or centralised storage platforms, and their performance tends to be dramatically degraded when a system consists of thousands of autonomous storage nodes. In this paper, we present a novel replication policy that allows a storage platform to eliminate useless replicas and maintain sufficient data availability at the same time. Through theoretical analysis, we have proven that the costs of the proposed policy is linearly increased with the number of underlying storage nodes, which means that it can be easily applied in large-scale distributed storage platform. The experimental results indicate that the proposed replication scheme can significant improve the effective utilisation of storage resources comparing with other existing policies. In addition, it exhibits a better robustness when the underlying storage platform is in presence of dramatically fluctuant workload.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call