Abstract Cardinality estimation is a crucial component in query optimizers. After decades of research, employing autoregressive models for cardinality estimation has demonstrated remarkable accuracy. However, when queries involve attributes with large domain sizes, autoregressive model-based estimators struggle to accurately capture the data distribution, leading to poor performance. Furthermore, these models often exhibit significant errors when handling queries with low-selectivity predicates. To address these challenges, we propose a self-adaptive cardinality estimator named AdaCard. Initially, we employ a self-adaptive smoothing factor selection strategy to variably adjust the original data, thereby mitigating the impact of large domain sizes. Secondly, to correct errors stemming from Monte Carlo sampling, we utilize resampling to refine the handling of low-selectivity predicates, thereby improving accuracy. Through evaluation using four real-world benchmarks, we compared AdaCard with mainstream baselines. The final results show that our estimator has the lowest tail estimation error and improves accuracy by nearly 10$\times $ over the second-best method, with similar latency and model size.
Read full abstract