Abstract

Join selectivity estimation is a fundamental problem in query optimization, which aims to estimate the cardinality returned by a join query. While join selectivity estimation has been extensively studied in relational databases, there are few studies toward join selectivity estimation for cross-model joins in multi-model databases, such as relation-tree joins between a relational table and a tree-structured document like a BSON file in MongoDB. So far, due to the popularity of MongoDB, many applications use both MongoDB and MySQL to organize heterogeneous data. Thus, it is necessary to devise efficient approaches for processing relation-tree joins running on relational and tree models. In this paper, we present an effective and efficient approach to estimate the join selectivity for relation-tree joins, which consists of a value join estimation and a structural join estimation. In particular, we propose a two-level sampling method that samples the relational tuples and tree nodes at two levels. Then, we apply the discrete learning algorithm to the tree node samples to estimate the join value distribution of the tree nodes. With this mechanism, we can capture the correlation between relational tuples and tree nodes and improve the estimation accuracy. We conduct experiments on the DBLP dataset and compare our approach to existing solutions, and the results suggest the effectiveness and efficiency of our proposal.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.