Abstract

Chunk-based de-duplication storage, which aims to optimise the storage or bandwidth usage by eliminating the duplicate chunks in the inter-file level, has been attended broadly both in academia and industry recently. For a petabyte-scale de-duplication storage system, the metadata storage especially the disk index, which establishes a mapping between the fingerprints and corresponding chunks in the system, can reach terabyte-scale size. In this paper, we propose a disk-resident hash table to implement the disk index, and theoretically study yet extensively experiment the probability of hash table overflow. These studies help us design a space-efficient disk index which not only reduces metadata storage but also improves access performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call