Scalable high performance de-duplication backup via hash join

Tian-Ming Yang,Zhong-Ying Niu,Ya-Ping Wan,Dan Feng

doi:10.1631/jzus.c0910445

Abstract

Apart from high space efficiency, other demanding requirements for enterprise de-duplication backup are high performance, high scalability, and availability for large-scale distributed environments. The main challenge is reducing the significant disk input/output (I/O) overhead as a result of constantly accessing the disk to identify duplicate chunks. Existing inline de-duplication approaches mainly rely on duplicate locality to avoid disk bottleneck, thus suffering from degradation under poor duplicate locality workload. This paper presents Chunkfarm, a post-processing de-duplication backup system designed to improve capacity, throughput, and scalability for de-duplication. Chunkfarm performs de-duplication backup using the hash join algorithm, which turns the notoriously random and small disk I/Os of fingerprint lookups and updates into large sequential disk I/Os, hence achieving high write throughput not influenced by workload locality. More importantly, by decentralizing fingerprint lookup and update, Chunkfarm supports a cluster of servers to perform de-duplication backup in parallel; it hence is conducive to distributed implementation and thus applicable to large-scale and distributed storage systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scalable high performance de-duplication backup via hash join

Abstract

Talk to us

Similar Papers

More From: Journal of Zhejiang University SCIENCE C

Lead the way for us

Journal: Journal of Zhejiang University SCIENCE C	Publication Date: May 1, 2010
Citations: 16

Similar Papers

Predictors of Glaucomatous Progression in Individuals with Small and Large Optic Discs
Connie Ho ... Joseph Caprioli
Ophthalmology Glaucoma | VOL. 7
Connie Ho, et. al.Connie Ho ... Joseph Caprioli
07 Nov 2023
Ophthalmology Glaucoma | VOL. 7

Secure Fragment Allocation in a Distributed Storage System with Heterogeneous Vulnerabilities
Yun Tian ... Mohammed I Alghamdi
-
Yun Tian, et. al.Yun Tian ... Mohammed I Alghamdi
01 Jul 2011
01 Jul 2011

HV-SNSP: A Low-Overhead Data Recovery Method Based on Cross-Checking
Ying Song ... Tiantong Mu
IEEE Access | VOL. 11
Ying Song, et. al.Ying Song ... Tiantong Mu
01 Jan 2023
IEEE Access | VOL. 11

Data Utilization-Based Adaptive Data Management Method for Distributed Storage System in WAN Environment
Sanghyuck Nam ... Jaehwan Lee
Computer Systems Science and Engineering | VOL. 46
Sanghyuck Nam, et. al.Sanghyuck Nam ... Jaehwan Lee
01 Jan 2023
Computer Systems Science and Engineering | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable high performance de-duplication backup via hash join

Abstract

Talk to us

Similar Papers

More From: Journal of Zhejiang University SCIENCE C