Improving Restore Performance in Deduplication-Based Backup Systems via a Fine-Grained Defragmentation Approach

Yujuan Tan,Jian Wen,Hong Jiang,Witawas Srisa-An,Baiping Wang,Zhichao Yan

doi:10.1109/tpds.2018.2828842

Yujuan Tan, Jian Wen + Show 4 more

Open Access

https://doi.org/10.1109/tpds.2018.2828842

Copy DOI

Abstract

In deduplication-based backup systems, the removal of redundant data transforms the otherwise logically adjacent data chunks into physically scattered chunks on the disks. This, in effect, changes the retrieval operations from sequential to random and significantly degrades the performance of restoring data. These scattered chunks are called fragmented data and many techniques have been proposed to identify and sequentially rewrite such fragmented data to new address areas, trading off the increased storage space for reduced number of random reads (disk seeks) to improve the restore performance. However, existing solutions for backup workloads share a common assumption that every read operation involves a large fixed-size window of contiguous chunks, which restricts the fragment identification to a fixed-size read window. This can lead to inaccurate identifications due to false positives since the data fragments can vary in size and appear in any different and unpredictable address locations. Based on these observations, we propose FGdefrag , a Fine-Grained defragmentation approach that uses variable-sized and adaptively located data groups, instead of using fixed-size read windows, to accurately identify and effectively remove fragmented data. When we compare its performance to those of existing solutions, FGdefrag not only reduces the amount of rewritten data but also significantly improves the restore performance. Our experimental results show that FGdefrag can improve the restore performance by 14 to 329 percent, while simultaneously reducing the rewritten data by 25 to 87 percent.

Full Text