Abstract

Primary storage deduplication systems are performance sensitive. Their performance depends upon two factors - metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in the literature. Most of the approaches assumed that the primary workloads exhibit strong temporal locality. Whereas, this cannot be assumed in the context of Cloud as the workloads locality does not exist with interferences among different workloads on the same system. Duplicate content among the data blocks with different addresses lead to an inefficient utilization of the data cache. In this context, applying deduplication causes sharing of the data blocks among the clients with different access patterns and frequencies. In this situation, LRU cache, which considers only the recency of the references, is not appropriate. In this paper, Hybrid Deduplication System (HDS) containing the content-based cache with a new replacement policy - Modified Adaptive Replacement Cache (ARC), is proposed. The proposed system is simulated in the Linux environment using three different types of FIU traces. Effectiveness of the system is compared with a full deduplication system. Experimental results show that the system has performed consistently better than the full deduplication system in reducing the metadata overhead for all of the three data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call