Abstract
An important feature of modern query optimizers is the ability to produce a query plan that is optimal for the underlying data set. This requires the ability to estimate cardinalities and computational costs of intermediate query plan nodes, which is highly dependent on both the query shape and the underlying data distribution. Traditional methods include collecting statistics on base tables and implementing cardinality and computational cost derivation inside the optimizer, which is error-prone for complex query shapes. This paper presents Presto's novel history-based optimization framework (HBO), which collects execution histories and uses them to optimize similar queries in the future. The framework produces accurate estimates for complex query shapes in a lightweight, automated manner, and adapts automatically to changes in underlying data distributions. We present the design and implementation of the HBO framework and provide details on its use in various optimization rules, as well as details on implementing the statistics store on top of a Redis key-value store. We also present the results of running HBO in production in two large data infrastructure organizations (Meta and Uber).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.