In many practical applications, G-Skyline query is an important operation to return the best tuple groups, which are not g-dominated by other tuple groups of the same size, from a potentially huge data space. It is found that the existing G-Skyline algorithms cannot deal well with massive data due to high I/O cost and high computation cost. This paper proposes a novel GPR algorithm, which is based on presorting and reuse principle, to compute G-Skyline groups on massive data efficiently. The execution of GPR consists of two phases: acquisition of the candidate tuples and computation of G-Skyline groups. The sublinear-I/O method is proposed in phase 1 to scan the presorted table, which is proved to hold early termination property. This paper devises the basic framework of phase 2 and analyzes its execution cost. The SR strategy is utilized to reuse the subset computation results effectively and reduce the execution cost of phase 2 considerably. The extensive experimental results, conducted on synthetic and real-life data sets, show that GPR outperforms the existing algorithms significantly.
Read full abstract