Abstract

AbstractData deduplication is a widely used technique to remove duplicate data to reduce the storage overhead. However, deduplication typically cannot eliminate the redundancy among nonidentical but similar data chunks. To reduce the storage overhead further, delta compression is often applied to compress the post‐deduplication data. While the two techniques are effective in saving storage space, they introduce complex references among data chunks, which inevitably undermines the system reliability and introduces fragmentation that may degrade the restore performance. In this paper, we observe that the delta compressed chunks (DCCs) are much smaller than regular chunks (non‐DCCs). Also, most fragmentation caused by the base chunk of DCCs remain fragmented in consecutive backups. Based on these observations, we introduce a framework called , which combines replication and erasure coding and uses History‐aware Delta Selection to ensure high reliability and restore performance. Specifically, uses a delta‐utilization‐aware filter and a cooperative cache scheme (CCS) to maintain cache locality and avoid unnecessary container reads, respectively. Moreover, the system selectively performs delta compression by historical information to avoid cyclic fragmentation in consecutive backups. Experimental results based on four real‐world datasets demonstrate that significantly improves the restore performance by 58.3%–76.7% with a low storage overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call