Abstract

Abstract Cardinality estimation is a crucial component in query optimizers. After decades of research, employing autoregressive models for cardinality estimation has demonstrated remarkable accuracy. However, when queries involve attributes with large domain sizes, autoregressive model-based estimators struggle to accurately capture the data distribution, leading to poor performance. Furthermore, these models often exhibit significant errors when handling queries with low-selectivity predicates. To address these challenges, we propose a self-adaptive cardinality estimator named AdaCard. Initially, we employ a self-adaptive smoothing factor selection strategy to variably adjust the original data, thereby mitigating the impact of large domain sizes. Secondly, to correct errors stemming from Monte Carlo sampling, we utilize resampling to refine the handling of low-selectivity predicates, thereby improving accuracy. Through evaluation using four real-world benchmarks, we compared AdaCard with mainstream baselines. The final results show that our estimator has the lowest tail estimation error and improves accuracy by nearly 10$\times $ over the second-best method, with similar latency and model size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.