Abstract
Query optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.
Highlights
Query optimizer is at the heart of relational database management systems (RDBMSes) and some big data process engines, e.g., SCOPE [7]
We focus on the query optimizer and give a comprehensive survey on the three key components of the optimizer
Cardinality estimation is the ability to estimate the tuples generated by an operator and is used in the cost model to calculate the cost of that operator
Summary
Query optimizer is at the heart of relational database management systems (RDBMSes) and some big data process engines, e.g., SCOPE [7]. Given a query written in a declarative language (e.g., SQL), the optimizer finds the most efficient execution plan ( called physical plan) and feeds it to the executor. Provided that the estimated cardinality and cost are accurate, and plan enumeration component can efficiently walk through the huge search space, this architecture can obtain the optimal execution plan in a reasonable time.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have