AbstractIn many practical applications, skyline query is an important operation to return the pareto optimal tuples, which provides a candidate set for the optimum. On massive data, skyline often reports too many results, the users will be overwhelmed and be difficult to find the desired information easily. This paper devises P-skyline to reduce the size of the returned results. Given the approximation factor, P-skyline only generates the prominent skyline results by the definition of p-dominance. To the best of our knowledge, this paper is the first work to study P-skyline problem. This paper first proposes a baseline algorithm, which requires one full table scan to compute the results. It is found that baseline algorithm incurs a relatively high execution cost on massive data. Then, PSTP algorithm is proposed, which consists of two stages: candidate acquisition and refinement. On the presorted table, PSTP utilizes selective retrieval and selective checking to process P-skyline with much lower I/O cost and computation cost. The extensive experimental results, conducted on synthetic and real-life data sets, show that PSTP can compute P-skyline on massive data efficiently.
Read full abstract