Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines

Subarna Chatterjee,Stratos Idreos,Lev Kruglyak,Mark F Pekala

doi:10.1145/3639302

Abstract

We present Limousine, a self-designing key-value storage engine, that can automatically morph to the near-optimal storage engine architecture shape given a workload, a cloud budget, and target performance. At its core, Limousine identifies the fundamental design principles of storage engines as combinations of learned and classical data structures that collaborate through algorithms for data storage and access. By unifying these principles over diverse hardware and three major cloud providers (AWS, GCP, and Azure), Limousine creates a massive design space of quindecillion (1048) storage engine designs the vast majority of which do not exist in literature or industry. Limousine contains a distribution-aware IO model to accurately evaluate any candidate design. Using these models, Limousine searches within the exhaustive design space to construct a navigable continuum of designs connected along a Pareto frontier of cloud cost and performance. If storage engines contain learned components, Limousine also introduces efficient lazy write algorithms to optimize the holistic read-write performance. Once the near-optimal design is decided for the given context, Limousine automatically materializes the corresponding design in Rust code. Using the YCSB benchmark, we demonstrate that storage engines automatically designed and generated by Limousine scale better by up to 3 orders of magnitude when compared with state-of-the-art industry-leading engines such as RocksDB, WiredTiger, FASTER, and Cosine, over diverse workloads, data sets, and cloud budgets.

Full Text