Abstract

Existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use post-processing deduplication to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services for the following two reasons: First, the temporal locality of duplicate data writes varies among primary storage workloads, which makes it challenging to efficiently allocate the inline cache space and achieve a good deduplication ratio. Second, the post-processing deduplication does not eliminate duplicate I/O operations that write to the same logical block address as it is performed after duplicate blocks have been written. A hybrid deduplication mechanism is promising to deal with these problems. Inline fingerprint caching is essential to achieving efficient hybrid deduplication. In this paper, we present a detailed analysis of the limitations of using existing caching algorithms in primary deduplication in the cloud. We reveal that existing caching algorithms either perform poorly or incur significant memory overhead in fingerprint cache management. To address this, we propose a novel fingerprint caching mechanism that estimates the temporal locality of duplicates in different data streams and prioritizes the cache allocation based on the estimation. We integrate the caching mechanism and build a hybrid deduplication system. Our experimental results show that the proposed mechanism provides significant improvement for both deduplication ratio and overhead reduction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.