An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems

Mohammadamin Ajdari,Patrick Raaf,Reza Salkhordeh,Hossein Asadi,Mostafa Kishani,André Brinkmann

doi:10.1145/3530896

Abstract

All-flash storage (AFS) systems have become an essential infrastructure component to support enterprise applications, where sub-millisecond latency and very high throughput are required. Nevertheless, the price per capacity ofsolid-state drives (SSDs) is relatively high, which has encouraged system architects to adoptdata reduction techniques, mainlydeduplication andcompression, in enterprise storage solutions. To provide higher reliability and performance, SSDs are typically grouped usingredundant array of independent disk (RAID) configurations. Data reduction on top of RAID arrays, however, adds I/O overheads and also complicates the I/O patterns redirected to the underlying backend SSDs, which invalidates the best-practice configurations used in AFS. Unfortunately, existing works on the performance of data reduction do not consider its interaction and I/O overheads with other enterprise storage components including SSD arrays and RAID controllers. In this paper, using a real setup with enterprise-grade components and based on the open-source data reduction module RedHat VDO, we reveal novel observations on the performance gap between the state-of-the-art and the optimal all-flash storage stack with integrated data reduction. We therefore explore the I/O patterns at the storage entry point and compare them with those at the disk subsystem. Our analysis shows a significant amount of I/O overheads for guaranteeing consistency and avoiding data loss through data journaling, frequent small-sized metadata updates, and duplicate content verification. We accompany these observations with cross-layer optimizations to enhance the performance of AFS, which range from deriving new optimal hardware RAID configurations up to introducing changes to the enterprise storage stack. By analyzing the characteristics of I/O types and their overheads, we propose three techniques: (a) application-aware lazy persistence, (b) a fast, read-only I/O cache for duplicate verification, and (c) disaggregation of block maps and data by offloading block maps to a very fast persistent memory device. By consolidating all proposed optimizations and implementing them in an enterprise AFS, we show 1.3× to 12.5× speedup over the baseline AFS with 90% data reduction, and from 7.8× up to 57× performance/cost improvement over an optimized AFS (with no data reduction) running applications ranging from 100% read-only to 100% write-only accesses.

Full Text

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems

Abstract

Published Version

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems

Lead the way for us

Journal: Proceedings of the ACM on Measurement and Analysis of Computing Systems	Publication Date: May 26, 2022
Citations: 2

Similar Papers

An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems
Mohammadamin Ajdari ... Mostafa Kishani
ACM SIGMETRICS Performance Evaluation Review | VOL. 50
Mohammadamin Ajdari, et. al.Mohammadamin Ajdari ... Mostafa Kishani
20 Jun 2022
ACM SIGMETRICS Performance Evaluation Review | VOL. 50

An off‐line system for data acquisition and analysis
Raymond C Master
Journal of the American Oil Chemists' Society | VOL. 48
Raymond C MasterRaymond C Master
01 May 1971
Journal of the American Oil Chemists' Society | VOL. 48

Load-aware Elastic Data Reduction and Re-computation for Adaptive Mesh Refinement
Mengxiao Wang ... Hong Jiang
-
Mengxiao Wang, et. al.Mengxiao Wang ... Hong Jiang
01 Aug 2019
01 Aug 2019

Efficient LRU Algorithm for Cache Scheduling in a Disk Array System
Hai Jin ... Kai Hwang
International Journal of Computers and Applications | VOL. 22
Hai Jin, et. al.Hai Jin ... Kai Hwang
01 Jan 1999
International Journal of Computers and Applications | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems

Abstract

Published Version

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems