Data deduplication is a widely employed technique in backup systems to enhance storage efficiency by eliminating duplicate chunks. Delta compression is a technique that complements deduplication by removing redundant data between similar chunks. However, when integrated into deduplication-based backup systems, delta compression can considerably decrease backup throughput due to the additional I/Os for fetching base chunks. Moreover, data deduplication can lead to chunk fragmentation, causing continuous chunks in a backup to become fragmented. We observe that the number of additional I/Os for fetching base chunks when delta compression is applied depends on the degree of chunk fragmentation. When chunk fragmentation is not significant, delta compression can improve storage efficiency without significantly impacting backup throughput. Based on this observation, we propose a redundancy elimination approach called Fragmentation-aware Redundancy Elimination (FaRE) for in-line backup systems.The main idea behind FaRE is the combined use of three techniques: fragmentation estimation for assessing the degree of chunk fragmentation, sequential deduplication to significantly reduce chunk fragmentation by only deduplicating against chunks with sequential layout when fragmentation is severe, and local redundancy elimination to greatly enhance storage efficiency by performing both deduplication and delta compression when chunk fragmentation is not significant. Our evaluation results demonstrate that FaRE achieves higher storage efficiency and restoration performance compared to traditional approaches that only adopt deduplication for redundancy elimination, while achieving comparable backup throughput due to the limited number of additional I/Os required.
Read full abstract