In many applications involving multiple criteria optimal decision making, users may often want to make a personal trade-off among all optimal solutions for selecting one object that fits best their personal needs. As a key feature, the skyline in a multidimensional space provides the minimum set of candidates for such purposes by removing all points not preferred by any (monotonic) utility/scoring functions; that is, the skyline removes all objects not preferred by any user no matter how their preferences vary. Driven by many recent applications with uncertain data, the probabilistic skyline model is proposed to retrieve uncertain objects based on skyline probabilities. Nevertheless, skyline probabilities cannot capture the preferences of monotonic utility functions. Motivated by this, in this article we propose a novel skyline operator, namely stochastic skylines. In the light of the expected utility principle, stochastic skylines guarantee to provide the minimum set of candidates to optimal solutions over a family of utility functions. We first propose the lskyline operator based on the lower orthant orders . lskyline guarantees to provide the minimum set of candidates to the optimal solutions for the family of monotonic multiplicative utility functions. While lskyline works very effectively for the family of multiplicative functions, it may miss optimal solutions for other utility /scoring functions (e.g., linear functions). To resolve this, we also propose a general stochastic skyline operator, gskyline , based on the usual orders . gskyline provides the minimum candidate set to the optimal solutions for all monotonic functions. For the first time regarding the existing literature, we investigate the complexities of determining a stochastic order between two uncertain objects whose probability distributions are described discretely . We firstly show that determining the lower orthant order is NP-complete with respect to the dimensionality; consequently the problem of computing lskyline is NP-complete. We also show an interesting result as follows. While the usual order involves more complicated geometric forms than the lower orthant order, the usual order may be determined in polynomial time regarding all the inputs, including the dimensionality; this implies that gskyline can be computed in polynomial time. A general framework is developed for efficiently and effectively retrieving lskyline and gskyline from a set of uncertain objects, respectively, together with efficient and effective filtering techniques. Novel and efficient verification algorithms are developed to efficiently compute lskyline over multidimensional uncertain data, which run in polynomial time if the dimensionality is fixed, and to efficiently compute gskyline in polynomial time regarding all inputs. We also show, by theoretical analysis and experiments, that the sizes of lskyline and gskyline are both quite similar to that of conventional skyline over certain data. Comprehensive experiments demonstrate that our techniques are efficient and scalable regarding both CPU and IO costs.
Read full abstract