As one type of the most popular cloud storage services, OpenStack Swift and its follow-up systems replicate each object across multiple storage nodes and leverage object sync protocols to achieve high reliability and eventual consistency . The performance of object sync protocols heavily relies on two key parameters: $r$ (number of replicas for each object) and $n$ (number of objects hosted by each storage node). In existing tutorials and demos, the configurations are usually $r=3$ and $n by default, and the sync process seems to perform well. However, we discover in data-intensive scenarios, e.g., when $r>3$ and $n\gg 1,000$ , the sync process is significantly delayed and produces massive network overhead, referred to as the sync bottleneck problem . By reviewing the source code of OpenStack Swift, we find that its object sync protocol utilizes a fairly simple and network-intensive approach to check the consistency among replicas of objects. Hence in a sync round, the number of exchanged hash values per node is $\Theta (n\times r)$ . To tackle the problem, we propose a lightweight and practical object sync protocol, LightSync , which not only remarkably reduces the sync overhead, but also preserves high reliability and eventual consistency. LightSync derives this capability from three novel building blocks: 1) Hashing of Hashes , which aggregates all the $h$ hash values of each data partition into a single but representative hash value with the Merkle tree; 2) Circular Hash Checking , which checks the consistency of different partition replicas by only sending the aggregated hash value to the clockwise neighbor; and 3) Failed Neighbor Handling , which properly detects and handles node failures with moderate overhead to effectively strengthen the robustness of LightSync. The design of LightSync offers provable guarantee on reducing the per-node network overhead from $\Theta (n\times r)$ to $\Theta (\frac{n}{h})$ . Furthermore, we have implemented LightSync as an open-source patch and adopted it to OpenStack Swift, thus reducing the sync delay by up to 879 $\times$ and the network overhead by up to 47.5 $\times$ .
Read full abstract