TTLCache: Taming Latency in Erasure-Coded Storage Through TTL Caching

Abubakr O. Al-Abbasi,Vaneet Aggarwal

doi:10.1109/tnsm.2020.2998175

Abstract

Distributed storage systems are known to be susceptible to long response time, and higher latency leads to a reduction in customers satisfaction. An elegant solution to reduce latency in such systems is through two methods - having redundancy in contents at the storage nodes, and adding a cache close to end-users. Redundancy could be added using an erasure code because of its high resiliency with low storage overhead. It is important to quantify the performance of distributed storage systems in the presence of redundancy and caching, which is the focus of this work. This paper proposes a framework for quantifying and jointly optimizing mean and tail latency in erasure-coded storage systems with edge-caching capabilities. A novel caching policy for caching contents in erasure-coded storage systems, called time-to-live (TTLCache), is proposed. Using TTLCache policy and probabilistic server-selection techniques, bounds for mean latency and latency tail probability (LTP) are characterized. A convex combination of both metrics is optimized over the choices of probabilistic scheduling and TTLCache parameters using an efficient algorithm. In all tested cases, the experimental results show the superiority of our approach as compared to the state of the other algorithms and some competitive baselines. Implementation in a real cloud environment is further used to validate the results.

Full Text