Abstract

Content-defined chunking (CDC) algorithms play an important role in data deduplication, data synchronization and cloud storage. The existing CDC algorithms have the problems of unstable chunk size variance and low chunking throughput in processing low entropy strings. To solve these problems, this paper proposes Double Extreme (DE) and Rapid Double Extreme (RDE) CDC algorithm. Both DE and RDE are hash-free chunking algorithms. DE uses the byte values in the sliding window to determine the cut point. The strategy of using both maximum and minimum allows DE to better handle low entropy strings and achieve a small chunk size variance. RDE, based on DE, uses a multi-step strategy to achieve higher chunking throughput. We compared DE and RDE with the existing CDC algorithms. The experimental results show that DE and RDE significantly reduce the chunk size variance of the CDC algorithms and improves the chunking throughput performance compare to other CDC algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call