Skyline operator is a useful tool in multi-criteria decision making in various applications. Uncertainty is inherent in real applications due to various reasons. In this paper, we consider the problem of efficiently computing probabilistic skylines against the most recent N uncertain elements in a data stream seen so far. Specifically, we study the problem in the n-of-N model; that is, computing the probabilistic skyline for the most recent n (??n?≤?N) elements, where an element is a probabilistic skyline element if its skyline probability is not below a given probability threshold q. Firstly, an effective pruning technique to minimize the number of uncertain elements to be kept is developed. It can be shown that on average storing only O(logdN) uncertain elements from the most recent N elements is sufficient to support the precise computation of all probabilistic n-of-N skyline queries in a d-dimension space if the data distribution on each dimension is independent. A novel encoding scheme is then proposed together with efficient update techniques so that computing a probabilistic n-of-N skyline query in a d-dimension space is reduced to O(dloglogN?+?s) if the data distribution is independent, where s is the number of skyline points. A trigger based technique is provided to process continuous n-of-N skyline queries. Extensive experiments demonstrate that the new techniques on uncertain data streams can support on-line probabilistic skyline query computation over rapid data streams.
Read full abstract