One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate Counting

Yingze Li,Hongzhi Wang,Xianglong Liu

doi:10.1145/3639270

Abstract

The modern database has many precise and approximate counting requirements. Nevertheless, a solitary multidimensional index or cardinality estimator is insufficient to cater to the escalating demands across all counting scenarios. Such approaches are constrained either by query selectivity or by the compromise between query accuracy and efficiency. We propose CardIndex, a unified learned structure to solve the above problems. CardIndex serves as a versatile solution that not only functions as a multidimensional learned index for accurate counting but also doubles as an adaptive cardinality estimator, catering to varying counting scenarios with diverse requirements for precision and efficiency. Rigorous experimentation has showcased its superiority. Compared to the state-of-the-art (SOTA) autoregressive data-driven cardinality estimation baselines, our structure achieves training and updating times that are two orders of magnitude faster. Additionally, our CPU-based query estimation latency surpasses GPU-based baselines by two to three times. Notably, the estimation accuracy of low-selectivity queries is up to 314 times better than the current SOTA estimator. In terms of indexing tasks, the construction speed of our structure is two orders of magnitude faster than RSMI and 1.9 times faster than R-tree. Furthermore, it exhibits a point query processing speed that is 3%-17% times faster than RSMI and 1.07 to 2.75 times faster than R-tree and KDB-tree. Range queries under specific loads are 20% times faster than the SOTA indexes.

Full Text