Abstract

Delta compression (or called delta encoding) is a data reduction technique capable of calculating the differences (i.e., delta) among the very similar files and chunks, and is thus widely used for optimizing synchronization replication, backup/archival storage, cache compression, etc. However, delta compression is costly because of its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches, are either at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this paper, we propose Gdelta, a fast delta encoding approach with a high compression ratio, that improves the delta encoding speed by employing an improved fast Gear-based rolling hash for scanning fine-grained words, and a quick array-based indexing scheme for word-matching, and then, after word-matching, further batch compressing the rest to improve the compression ratio. Our evaluation results driven by six real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 2X∼4X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼120%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call