From SSDs Back to HDDs: Optimizing VDO to Support Inline Deduplication and Compression for HDDs as Primary Storage Media

Patrick Raaf,Hossein Asadi,John Bent,Mohamad El-Batal,Eric Borba,André Brinkmann,Reza Salkhordeh,Sai Narasimhamurthy

doi:10.1145/3678250

Abstract

Deduplication and compression are powerful techniques to reduce the ratio between the quantity of logical data stored and the physical amount of consumed storage. Deduplication can impose significant performance overheads, as duplicate detection for large systems induces random accesses to the backend storage. These random accesses have led to the concern that deduplication for primary storage and HDDs are not compatible. Most inline data reduction solutions are therefore optimized for SSDs and discourage their use for HDDs, even for sequential workloads. In this work, we show that these concerns are valid if and only if the lessons learned from deduplication research are not applied. We have therefore investigated data reduction solutions for primary storage based on the RedHat Virtual Disk Optimizer (VDO) and show that directly applying them can decrease sequential write performance for HDDs by 36×. We then show that slight modifications to VDO plus the integration of a very small SSD area significantly improve performance even beyond the performance without data reduction enabled, making HDDs more cost-efficient for a wide range of mostly sequential cloud workloads than SSDs. Additionally, these VDO optimizations do not require to maintain different code bases for HDDs and SSDs, and we therefore provide the first data reduction solution applicable to both storage media.

Full Text