Improving duplicate elimination in storage systems

Deepak R Bobbarjung,Cezary Dubnicki,Suresh Jagannathan

doi:10.1145/1210596.1210599

Deepak R Bobbarjung, Cezary Dubnicki + Show 1 more

Open Access

https://doi.org/10.1145/1210596.1210599

Copy DOI

Abstract

Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.Intelligent object partitioning techniques identify data that is new when objects are updated, and transfer only these chunks to a storage server. In this article, we propose a new object partitioning technique, called fingerdiff , that improves upon existing schemes in several important respects. Most notably, fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff , and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25%, and bandwidth utilization on average by 40% over comparable techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving duplicate elimination in storage systems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Storage

Lead the way for us

Journal: ACM Transactions on Storage	Publication Date: Nov 1, 2006
Citations: 175

Similar Papers

IEEE 802.16 無線寬頻網路中，提升頻寬資源利用率之排程演算法設計

-

01 Jan 2013
IEEE 802.16 無線寬頻網路中，提升頻寬資源利用率之排程演算法設計

Improving DRAM Bandwidth Utilization with MLP-Aware OS Paging
Rishiraj A Bheda ... Thomas M Conte
-
Rishiraj A Bheda, et. al.Rishiraj A Bheda ... Thomas M Conte
03 Oct 2016
03 Oct 2016

Segment Scheduling Scheme for Efficient Bandwidth Utilization of HTTP Adaptive Streaming in Multipath Environments
Heekwang Kim ... Kwangsue Chung
IEEE Access | VOL. 7
Heekwang Kim, et. al.Heekwang Kim ... Kwangsue Chung
01 Jan 2019
IEEE Access | VOL. 7

Integer-multiple-spacing-based scheduling for multimedia applications in IEEE 802.11e HCCA wireless networks
Li Feng ... Jianqing Li
Computer Networks | VOL. 56
Li Feng, et. al.Li Feng ... Jianqing Li
03 Sep 2012
Integer-multiple-spacing-based scheduling for multimedia applications in IEEE 802.11e HCCA wireless networks
Li Feng ... Jianqing Li

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving duplicate elimination in storage systems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Storage