Discovering the Representative Skyline Over a Sliding Window

Mei Bai,Junchang Xin,Luming Zhang,Roger Zimmermann,Xindong Wu,Guoren Wang,Ye Yuan

doi:10.1109/tkde.2016.2546242

Abstract

<i>A representative skyline</i> contains <inline-formula><tex-math notation="LaTeX">$k$ </tex-math></inline-formula> skyline points that can represent its corresponding full skyline. The existing measuring criteria of <inline-formula> <tex-math notation="LaTeX">$k$</tex-math></inline-formula> representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the <inline-formula> <tex-math notation="LaTeX">$k$</tex-math></inline-formula> representative skyline over data streams. First, we propose a new criterion to choose <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> skyline points as the <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> representative skyline for data stream environments, termed the <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> largest dominance skyline ( <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the <inline-formula><tex-math notation="LaTeX">$k$</tex-math> </inline-formula> -LDS problem in a 2-dimensional space. The time complexity of PBA is only <inline-formula><tex-math notation="LaTeX"> $\mathcal {O}((M-k)\times k)$</tex-math></inline-formula> where <inline-formula><tex-math notation="LaTeX">$M$</tex-math> </inline-formula> is the size of the full skyline set. Third, the <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -LDS problem for a <inline-formula> <tex-math notation="LaTeX">$d$</tex-math></inline-formula> -dimensional ( <inline-formula><tex-math notation="LaTeX">$d\ge 3$</tex-math> </inline-formula> ) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer <inline-formula><tex-math notation="LaTeX"> $k$</tex-math> </inline-formula> -LDS queries. To further accelerate the calculation, we propose a <inline-formula> <tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula> -greedy algorithm which can achieve an approximate factor of <inline-formula><tex-math notation="LaTeX"> $\frac{1}{(1+\epsilon)}(1-\frac{1}{\sqrt{e}})$</tex-math></inline-formula> . Experimental results on both synthetic and real-world data show that our <inline-formula><tex-math notation="LaTeX">$k$</tex-math> </inline-formula> -LDS significantly outperforms its competitors in data stream environments. Furthermore, we demonstrate that the proposed <inline-formula> <tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula> -greedy algorithm can solve <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -LDS efficiently and with a competitive accuracy.

Full Text