Improving Read Performance with BP-DAGs for Storage-Efficient File Backup

Tianming Yang,Jing Zhang,Ningbo Hao

doi:10.2174/1874129001307010090

Abstract

The continued growth of data and high-continuity of application have raised a critical and mounting demand on storage-efficient and high-performance data protection. New technologies, especially the D2D (Disk-to-Disk) deduplication storage are therefore getting wide attention both in academic and industry in the recent years. Existing deduplication systems mainly rely on duplicate locality inside the backup workload to achieve high throughput but suffer from read performance degrading under conditions of poor duplicate locality. This paper presents the design and performance evaluation of a D2D-based de-duplication file backup system, which employs caching techniques to improve write throughput while encoding files as graphs called BP-DAGs (Bi-pointer-based Directed Acyclic Graphs). BP-DAGs not only satisfy the ‘unique’ chunk storing policy of de-duplication, but also help improve file read performance in case of poor duplicate locality workloads. Evaluation results show that the system can achieve comparable read performance than non de-duplication backup systems such as Bacula under representative workloads, and the metadata storage overhead for BP-DAGs are reasonably low.

Highlights

Data explosion [1] has been forcing backups to expand storage capacity, which makes modern enterprises face significant cost pressures and data management challenges
This paper mainly focuses on data de-duplication and BP-DAGs, and not on the backup job management, so, the rest of the section is dedicated to workflow of backup agent and storage server
The index chunk is stored to the container and its new address is built to the BP-DAG, otherwise, it is discarded and its address pointer is copied from the fingerprint cache to the BP-DAG

Summary

INTRODUCTION

Data explosion [1] has been forcing backups to expand storage capacity, which makes modern enterprises face significant cost pressures and data management challenges. The key challenge for modern enterprises data protection is to construct storage-efficient backup systems with high performance on both data write and read throughputs. Most of the existing de-duplication systems use caching technique, which judiciously exploits duplicate locality within the backup stream to avoid the disk index bottleneck, and achieves high de-duplication throughput [9, 10]. In existing de-duplication systems file chunks are indexed by their fingerprints (i.e., hash pointers), which are called Content-Addressed Storage (CAS) [14]. In order to maintain high read throughput under various workloads, files were encoded as graphs called Bi-Pointer-based Directed Acyclic Graphs (BP-DAGs) whose nodes had variable-sized chunks of data and whose edges were hash plus address pointers.

THE STORAGE-EFFICIENT FILE BACKUP

System Architecture

De-duplication Backup Process

Write-Once Storage Policy

BI-POINTER-BASED DIRECTED ACYCLIC GRAPHS

The Structure of BP-DAGs

BP-DAGs Building

Restoring Files from BP-DAGs

EXPERIMENTAL EVALUATION

System Setup

Results and Discussions

CONCLUSIONS

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Open Electrical & Electronic Engineering Journal	Publication Date: Oct 18, 2013
Citations: 11	License type: CC BY 4.0

R Discovery Prime

Improving Read Performance with BP-DAGs for Storage-Efficient File Backup

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: The Open Electrical & Electronic Engineering Journal

Lead the way for us

Similar Papers

Ps-Code: A New Code for Improved Degraded Mode Read and Write Performance of RAID Systems
Bingzhe Li ... Manas Minglani
-
Bingzhe Li, et. al.Bingzhe Li ... Manas Minglani
01 Aug 2016
01 Aug 2016

Modeling higher order thinking skills and metacognitive awareness in English reading comprehension among university learners
Musharraf Aziz ... Rafizah Rawian
Frontiers in Education | VOL. 7
Musharraf Aziz, et. al.Musharraf Aziz ... Rafizah Rawian
29 Sep 2022
Frontiers in Education | VOL. 7

The importance of intrinsic motivation for high and low ability readers' reading comprehension performance
Sarah Logan ... Naomi Hughes
Learning and Individual Differences | VOL. 21
Sarah Logan, et. al.Sarah Logan ... Naomi Hughes
10 Oct 2010
Learning and Individual Differences | VOL. 21

Differential trust between parents and teachers of children from low-income and immigrant backgrounds
Marije Janssen ... Paul P.M Leseman
Educational Studies | VOL. 38
Marije Janssen, et. al.Marije Janssen ... Paul P.M Leseman
01 Oct 2012
Educational Studies | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Improving Read Performance with BP-DAGs for Storage-Efficient File Backup

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: The Open Electrical &amp; Electronic Engineering Journal

More From: The Open Electrical & Electronic Engineering Journal