Improving Read Throughput of Deduplicated Cloud Storage using Frequent Pattern-Based Prefetching Technique

Prabavathy Balasundaram,Chitra Babu,Subha Devi M

doi:10.1093/comjnl/bxw013

Abstract

In a cloud storage, while deduplication enables optimal storage space utilization, it also incurs a substantial overhead in maintaining the metadata, namely, the Fingerprint Index and the File Recipe. As this metadata is huge, it has to be stored in the disk which causes considerable read-latency in Deduplicated Cloud Storage (DCS). In order to improve this read-latency, it would be highly beneficial to prefetch the relevant fingerprints in a cache. Many existing research solutions have utilized either the spatial locality or the similarity among the files to prefetch the relevant fingerprints. However, the DCS that has been designed and implemented in this paper is intended to cater to non-backup workloads that do not exhibit significant spatial locality or similarity among the files. Hence, this paper proposes a suitable alternative prefetching approach that mines the pattern of client read accesses to find the most frequently accessed files. The proposed prefetching approach has been implemented and incorporated in the DCS. The experimental investigations indicate that the proposed prefetching approach improves the cache hit rates by 140% and increases the read throughput by 88% when compared with the Extreme Binning approach (Bhagwat, D., Eshghi, K. Long, D.D.E. and Lillibridge, M. (2009) Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup. Proc. MASCOTS'09, London, UK, September 21–23, pp. 1–9. IEEE) while incurring only a marginal computational overhead of 1.7 s.

Full Text