Abstract
Chunk-based de-duplication storage, which aims to optimise the storage or bandwidth usage by eliminating the duplicate chunks in the inter-file level, has been attended broadly both in academia and industry recently. For a petabyte-scale de-duplication storage system, the metadata storage especially the disk index, which establishes a mapping between the fingerprints and corresponding chunks in the system, can reach terabyte-scale size. In this paper, we propose a disk-resident hash table to implement the disk index, and theoretically study yet extensively experiment the probability of hash table overflow. These studies help us design a space-efficient disk index which not only reduces metadata storage but also improves access performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have