Abstract

Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, etc. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this paper, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X ∼ 25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10% ∼ 240%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call