Abstract
Deduplication and compression are powerful techniques to reduce the ratio between the quantity of logical data stored and the physical amount of consumed storage. Deduplication can impose significant performance overheads, as duplicate detection for large systems induces random accesses to the backend storage. These random accesses have led to the concern that deduplication for primary storage and HDDs are not compatible. Most inline data reduction solutions are therefore optimized for SSDs and discourage their use for HDDs, even for sequential workloads. In this work, we show that these concerns are valid if and only if the lessons learned from deduplication research are not applied. We have therefore investigated data reduction solutions for primary storage based on the RedHat Virtual Disk Optimizer (VDO) and show that directly applying them can decrease sequential write performance for HDDs by 36×. We then show that slight modifications to VDO plus the integration of a very small SSD area significantly improve performance even beyond the performance without data reduction enabled, making HDDs more cost-efficient for a wide range of mostly sequential cloud workloads than SSDs. Additionally, these VDO optimizations do not require to maintain different code bases for HDDs and SSDs, and we therefore provide the first data reduction solution applicable to both storage media.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.