ACM Transactions on Storage | VOL. 18

From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s Locality

Publication Date Aug 31, 2022


Data deduplication is widely used to reduce the size of backup workloads, but it has the known disadvantage of causing poor data locality, also referred to as the fragmentation problem. This results from the gap between the hyper-dimensional structure of deduplicated data and the sequential nature of many storage devices, and this leads to poor restore and garbage collection (GC) performance. Current research has considered writing duplicates to maintain locality (e.g., rewriting) or caching data in memory or SSD, but fragmentation continues to lower restore and GC performance. Investigating the locality issue, we design a method to flatten the hyper-dimensional structured deduplicated data to a one-dimensional format, which is based on classification of each chunk’s lifecycle, and this creates our proposed data layout. Furthermore, we present a novel management-friendly deduplication framework, called MFDedup, that applies our data layout and maintains locality as much as possible. Specifically, we use two key techniques in MFDedup: Neighbor-duplicate-focus indexing (NDF) and Across-version-aware Reorganization scheme (AVAR). NDF performs duplicate detection against a previous backup, then AVAR rearranges chunks with an offline and iterative algorithm into a compact, sequential layout, which nearly eliminates random I/O during file restores after deduplication. Evaluation results with five backup datasets demonstrate that, compared with state-of-the-art techniques, MFDedup achieves deduplication ratios t...


Data Layout Garbage Collection Deduplicated Data Backup Versions Sequential Layout Previous Backup Reorganization Scheme Fragmentation Problem Caching Data Key Techniques

Round-ups are the summaries of handpicked papers around trending topics published every week. These would enable you to scan through a collection of papers and decide if the paper is relevant to you before actually investing time into reading it.

Climate change Research Articles published between Sep 19, 2022 to Sep 25, 2022

R DiscoverySep 26, 2022
R DiscoveryArticles Included:  5

Disaster Prevention and Management ISSN: 0965-3562 Article publication date: 20 September 2022 This paper applies the theory of cascading, interconnec...

Read More

Coronavirus Pandemic

You can also read COVID related content on R COVID-19

R ProductsCOVID-19


Creating the world’s largest AI-driven & human-curated collection of research, news, expert recommendations and educational resources on COVID-19

COVID-19 Dashboard

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on “as is” basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The Copyright Law.