A study on disk index design for large scale de-duplication storage systems

Tian Ming Yang,Jing Ning Liu,Dan Feng,Wen Kuang Chou

doi:10.1504/ijcse.2015.067074

A study on disk index design for large scale de-duplication storage systems

Tian Ming Yang, Jing Ning Liu + Show 2 more

https://doi.org/10.1504/ijcse.2015.067074

Copy DOI

Journal: International Journal of Computational Science and Engineering	Publication Date: Jan 1, 2015
Citations: 1

Affiliation: Wuhan National Laboratory for Optoelectronics, Providence University

#De-duplication Storage System #Improves Access Performance + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Chunk-based de-duplication storage, which aims to optimise the storage or bandwidth usage by eliminating the duplicate chunks in the inter-file level, has been attended broadly both in academia and industry recently. For a petabyte-scale de-duplication storage system, the metadata storage especially the disk index, which establishes a mapping between the fingerprints and corresponding chunks in the system, can reach terabyte-scale size. In this paper, we propose a disk-resident hash table to implement the disk index, and theoretically study yet extensively experiment the probability of hash table overflow. These studies help us design a space-efficient disk index which not only reduces metadata storage but also improves access performance.

Full Text