ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System

Chuanyi Liu,Dong-Sheng Wang,Chunhui Shi,Guanlin Lu,Yingping Lu,David H C Du

doi:10.1109/snapi.2008.11

Abstract

There is a huge amount of duplicated or redundant data in current storage systems. So data de-duplication, which uses lossless data compression schemes to minimize the duplicated data at the inter-file level, has been receiving broad attention in recent years. But there are still research challenges in current approaches and storage systems, such as: how to chunking the files more efficiently and better leverage potential similarity and identity among dedicated applications; how to store the chunks effectively and reliably into secondary storage devices. In this paper, we propose ADMAD: an application-driven metadata aware de-duplication archival storage system, which makes use of certain meta-data information of different levels in the I/O path to direct the file partitioning into more meaningful data chunks (MC) to maximally reduce the inter-file level duplications. However, the chunks may be with different lengths and variable sizes, storing them into storage devices may result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. Therefore, in ADMAD, chunks are further packaged into fixed sized objects as the storage units to speed up the I/O performance as well as to ease the data management. Preliminary experiments have demonstrated that the proposed system can further reduce the required storage space when compared with current methods (from 20% to near 50% according to several datasets), and largely improves the writing performance (about 50%-70% in average).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Topology-Aware Data Placement Strategy for Fault-Tolerant Storage Systems
Junxu Xia ... Geyao Cheng
IEEE Systems Journal | VOL. 14
Junxu Xia, et. al.Junxu Xia ... Geyao Cheng
10 Mar 2020
IEEE Systems Journal | VOL. 14

Cooperative mode: Comparative storage metadata verification applied to the Xbox 360
Alex J Nelson ... Darrell D.E Long
Digital Investigation | VOL. 11
Alex J Nelson, et. al.Alex J Nelson ... Darrell D.E Long
17 Jul 2014
Digital Investigation | VOL. 11

WatCache: a workload-aware temporary cache on the compute side of HPC systems
Jie Yu ... Wenrui Dong
The Journal of Supercomputing | VOL. 75
Jie Yu, et. al.Jie Yu ... Wenrui Dong
26 Oct 2017
The Journal of Supercomputing | VOL. 75

Data Deduplication and Fine-Grained Auditing on Big Data in Cloud Storage
Rn Karthika ... C Valliyammai
-
Rn Karthika, et. al.Rn Karthika ... C Valliyammai
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System

Abstract

Talk to us

Similar Papers