Abstract

It is known that optimization of join queries based on average selectivities is sub-optimal in highly correlated databases. Relations are naturally divided into partitions , each partition having substantially different statistical characteristics in such databases. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query , one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plan. The purpose of this paper is to present faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations.

Highlights

  • It is known that optimization of join queries based on average selectivities is sub-optimal in highly correlated databases

  • Relations are naturally divided into partitions, each partition having substantially different statistical characteristics in such databases

  • When data correlations are present, the input relations are naturally divided into partitions, each partition having completely different statistical characteristics

Read more

Summary

Introduction

It is known that optimization of join queries based on average selectivities is sub-optimal in highly correlated databases. Relations are naturally divided into partitions , each partition having substantially different statistical characteristics in such databases Traditional query optimizers pick one execution plan per query, based on first-order statistics about the underlying data. The presence of data correlations does make selectivity estimation harder-it offers opportunities for more effective query optimization. When data correlations are present, the input relations are naturally divided into partitions, each partition having completely different statistical characteristics. It is very attractive to create multiple plans per query, each plan being optimized for a different combination of data partitions. The combined cost of the two resulting plans can be smaller than the cost of any possible monolithic plan

Objectives
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.