Abstract

Traditional multidimensional histograms,which are widely used in cardinality estimation for conjunctive range query predicates in RDBMS′s query optimizers,take the assumption of the existence of correlations among attributes instead of the plausible AVI assumption.But they do not further discriminate between different degrees of correlations among attributes.Based on accurate measurements of data distributions,data correlated coefficients and value domain density,the authors propose different optimal multidimensional histograms for different data distributions,COCA-Hist.Also they analyze the worst cases for traditional MHist-2 histograms and find effective ways to alleviate the situation.The authors conduct experiments to compare the accuracy and performance between COCA-Hist,and MHist-2,GENHist and STHoles.The results demonstrate that COCA-Hist histograms are superior in accuracy and performance than MHist-2 either in average case or in worst case.In the soft functional dependence situation,COCA-Hist is much better in either accuracy or building-up time by orders of magnitudes than GENHist.Under limited space budgets,COCA-Hist is one order of magnitude efficient than STHoles in building-up time.While STHoles exhibits good accuracy under sufficient space budget,in average COCA-Hist can achieve relatively better accuracy than STHoles.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.