Abstract

The skyline query is important in the database community. In recent years, the researches on incomplete data have been increasingly considered, especially for the skyline query. However, the existing skyline definition on incomplete data cannot provide users with valuable references. In this paper, we propose a novel skyline definition utilizing probabilistic model on incomplete data where each point has a probability to be in the skyline. In particular, it returns K points with the highest skyline probabilities. In addition, we propose incomplete models and estimate probability density functions of missing values on independent, correlated, and anti-correlated distributions, respectively. Meanwhile, it is a big challenge to compute probabilistic skyline on incomplete data. We propose three efficient algorithms SPISkyline, SPCSkyline, and SPASkyline for probabilistic skyline computation on incomplete data complying with independent, correlated, and anti-correlated distributions, respectively. They employ pruning strategy, optimization of the process of probability computation, and sorting technique to improve the efficiency of probabilistic skyline computation on incomplete data. Our experimental results demonstrate that our proposed concept of probabilistic skyline is an effective method to tackle skyline query on incomplete data and our algorithms are tens of times faster than the naive algorithm on both synthetic and real datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.