P-Dedupe: Exploiting Parallelism in Data Deduplication System

Wen Xia,Dan Feng,Lei Tian,Hong Jiang,Min Fu,Zhongtao Wang

doi:10.1109/nas.2012.46

Abstract

Data deduplication, an efficient space reduction method, has gained increasing attention and popularity in data-intensive storage systems. Most existing state-of-the-art deduplication methods remove redundant data at either the file level or the chunk level, which incurs unavoidable and significant overheads in time (due to chunking and fingerprinting). These overheads can degrade the write performance to an unacceptable level in a data storage system. In this paper, we propose P-Dedupe, a fast and scalable deduplication system. The main idea behind P-Dedupe is to fully compose pipelined and parallel computations of data deduplication by effectively exploiting the idle resources of modern computer systems with multi-core and many-core processor architectures. Our experimental evaluation of the P-Dedupe prototype based on real-world datasets shows that P-Dedupe speeds up the deduplication write throughput by a factor of 2~4 through pipelining deduplication and parallelizing hash calculation and achieves 80%~250% of the performance of a conventional storage system without data deduplication.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

P-Dedupe: Exploiting Parallelism in Data Deduplication System

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Experimental Study on Chunking Algorithms of Data Deduplication System on Large Scale Data
T R Nisha ... E Manohar
-
T R Nisha, et. al.T R Nisha ... E Manohar
08 Dec 2015
08 Dec 2015

ProSy: A similarity based inline deduplication system for primary storage
Xin Du ... Weizheng Hu
-
Xin Du, et. al. Xin Du ... Weizheng Hu
01 Aug 2015
01 Aug 2015

Metadata Feedback and Utilization for Data Deduplication Across WAN
Bing Zhou ... Jiang-Tao Wen
Journal of Computer Science and Technology | VOL. 31
Bing Zhou, et. al.Bing Zhou ... Jiang-Tao Wen
01 May 2016
Journal of Computer Science and Technology | VOL. 31

Leveraging data deduplication to improve the performance of primary storage systems in the cloud
Bo Mao ... Hong Jiang
-
Bo Mao, et. al.Bo Mao ... Hong Jiang
01 Oct 2013
01 Oct 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

P-Dedupe: Exploiting Parallelism in Data Deduplication System

Abstract

Talk to us

Similar Papers