Abstract

Multiple data reduction techniques have been investigated to lower storage costs for a wide variety of customers. In this work, we focus on similarity-based delta compression, which calculates and stores the difference of very similar, but non-duplicate, chunks in storage systems. Delta compression is often implemented along with deduplication and has been shown to achieve a much higher compression ratio.Currently, the N-Transform method is the most popular and widely-used approach to generate features for data content (e.g. chunks) to detect similar candidates (and then apply delta compression). For delta compression systems, though, the throughput of N-Transform is often the bottleneck. Finesse is a high throughput variant of N-Transform, but it suffers from lower detection accuracy and compression ratio. The computation overhead of N-Transform consists of two parts: calculating the rolling hash across data and applying time-consuming transforms on each hash. In this work, we propose Odess, a fast resemblance detection approach, that uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set and then applies transforms on this small hash set. This reduces the calculations in the transform step from being the bottleneck. Meanwhile, Odess also leverages the faster Gear hash to generate rolling hashes. Thus, Odess greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and high compression ratio.Our evaluation results show that Odess is ~ 5.4× (Finesse) and ~ 26.9× (N-Transform) faster (on average) at generating features for resemblance detection. When considering an end-to-end data reduction storage system, Odess increases throughput by ~ 1.36× (Finesse) and ~ 2.76× (N-Transform) while maintaining the compression ratio of N-Transform and increasing the compression ratio ~ 1.22× over Finesse.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call