Abstract

Data replication is a key technique to achieve high data availability, reliability, and optimized performance in distributed storage systems. In recent years, with emerged new storage devices, heterogeneous object-based storage systems, such as a storage system with a mix of hard disk drives, solid state drives, and other non-volatile memory devices have become increasingly attractive since they combine the merits of different storage devices to deliver better promises. However, existing data replication schemes do not well consider distinct characteristics of heterogeneous storage devices yet, which could lead to suboptimal performance. This article introduces a new data replication scheme called Pattern-directed Replication Scheme (PRS) to achieve efficient data replication for heterogeneous storage systems. Different from traditional schemes, the PRS selectively replicates data objects and distributes replicas to various storage devices based on their characteristics. It aggregates objects that have I/O correlation into object groups by calculating object distance and makes replication for grouped objects according to application's data access pattern identified. In addition, the PRS uses a pseudo random algorithm to optimize replica placement by considering the storage device performance and capacity features. We have evaluated the pattern-directed replication scheme with extensive tests in Sheepdog, a typical object-based storage system. The experimental results confirm that it is a highly efficient replication scheme for heterogeneous storage systems. For instance, the read performance was improved by 105 percent to nearly 10x compared with existing replication schemes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call