Optimal partitioning of data chunks in deduplication systems

M Hirsch,A Ish-Shalom,S.T Klein

doi:10.1016/j.dam.2015.12.018

Optimal partitioning of data chunks in deduplication systems

M Hirsch, A Ish-Shalom + Show 1 more

Open Access

https://doi.org/10.1016/j.dam.2015.12.018

Copy DOI

Journal: Discrete Applied Mathematics	Publication Date: Feb 2, 2016
Citations: 1	License type: elsevier-specific: oa user license

Affiliation: IBM Research - Haifa, Bar-Ilan University

#Deduplication Systems #Large Chunks + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Deduplication is a special case of data compression in which repeated chunks of data are stored only once. For very large chunks, this process may be applied even if the chunks are similar and not necessarily identical, and then the encoding of duplicate data consists of a sequence of pointers to matching parts. However, not all the pointers are worth being kept, as they incur some storage overhead. A linear, sub-optimal solution of this partition problem is presented, followed by an optimal solution with cubic time complexity and requiring quadratic space.

Full Text