CALC: A Content-Aware Learning Cache for Storage Systems

Maher Kachmar,David Kaeli

doi:10.1109/nas51552.2021.9605381

Abstract

In today’s enterprise storage systems, supported services such as data deduplication are becoming a common feature adopted in the data center, especially as new storage technologies mature. Static partitioning of storage system resources, including CPU cores and memory caches, may lead to missing Service Level Agreement (SLAs) thresholds, such as the Data Reduction Rate (DRR) or IO latency. However, typical storage system applications exhibit a workload pattern that can be learned. By learning these pattern, we are better equipped to address several storage system resource partitioning challenges, issues that cannot be overcome with traditional manual tuning and primitive feedback mechanisms.We propose a Content-Aware Learning Cache (CALC) that uses online reinforcement learning models (Q-Learning, SARSA and Actor-Critic) to actively partition the storage system cache between a data digest cache, content cache, and address-based data cache to improve cache hit performance, while maximizing data reduction rates. Using traces from popular storage applications, we show how our machine learning approach is robust and can out-perform an iterative search method for various datasets and cache sizes. Our content-aware learning cache improves hit rates by 7.1% when compared to iterative search methods, and 18.2% when compared to traditional LRU-based data cache implementation.

Full Text