SCAIL: Encrypted Deduplication With Segment Chunks and Index Locality

Jaybe Ammons,Trevor Fenner,David Weston

doi:10.1109/nas55553.2022.9925408

Abstract

Encrypted deduplicated storage systems face multiple challenges: long-term backup workloads with metadata storage approaching the size of the deduplicated data itself and the disk bottleneck problem caused by fingerprint indexes that will not fit in memory. Resource contention issues may arise when backup loads at peak-time force deduplication of many concurrent backup streams. We propose a system that performs both client and server deduplication. The client forms segments from groups of chunks, enabling coarse-grained client-side deduplication. Deduplicating segment-sized chunks reduces server memory requirements, metadata upload and storage volume but can cause the upload of some previously uploaded data chunks. For longer-term workloads, this redundant upload volume can be offset by the reduced metadata upload volume. On the server, chunk-level deduplication is performed by merge-sorting the fingerprints of multiple clients in a single pass. The server-side deduplication produces high levels of duplicate elimination, and we use index locality to reduce resource contention between concurrent backup streams and significantly reduce disk I/O for indexing. Experiments with two real-world backup datasets using an 8KiB chunk size and 512KiB-4MiB segment sizes reduced server memory requirements by up to 81.6% and metadata storage volume by up to 97.2%. Client to server upload volume ranged from a reduction of 23.9% to an increase of 13.3%. The use of index locality deduplication reduced index I/Os for 2MiB segments to less than 350 per backup generation.

Full Text