Abstract

Query optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.

Highlights

  • Query optimizer is at the heart of relational database management systems (RDBMSes) and some big data process engines, e.g., SCOPE [7]

  • We focus on the query optimizer and give a comprehensive survey on the three key components of the optimizer

  • Cardinality estimation is the ability to estimate the tuples generated by an operator and is used in the cost model to calculate the cost of that operator

Read more

Summary

Introduction

Query optimizer is at the heart of relational database management systems (RDBMSes) and some big data process engines, e.g., SCOPE [7]. Given a query written in a declarative language (e.g., SQL), the optimizer finds the most efficient execution plan ( called physical plan) and feeds it to the executor. Provided that the estimated cardinality and cost are accurate, and plan enumeration component can efficiently walk through the huge search space, this architecture can obtain the optimal execution plan in a reasonable time.

A Survey on Advancing the DBMS Query Optimizer
Cardinality Estimation
Cost Model
Plan Enumeration
Synopsis‐Based Methods
Histogram
Sketch
Other Techniques
Sampling‐Based Methods
Model predicates
Supervised Methods
Unsupervised Methods
Methods
Summaries
Possible Future Directions
Quality Improvement of Existing Cost Model
Cost Model Alternatives
Dynamic Programming
Top‐Down Strategies
Large Queries
Others
Learning‐Based Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call