Efficient Parallel Skyline Evaluation Using MapReduce

Ji Zhang,Wei-Shinn Ku,Xunfei Jiang,Xiao Qin

doi:10.1109/tpds.2015.2472016

Abstract

This research develops an advanced two-phase MapReduce solution that is able to efficiently address skyline queries on large datasets. Unlike existing parallel skyline approaches, our scheme considers data partitioning, filtering, and parallel skyline evaluation as a holistic query process. In particular, we apply filtering techniques and angle-based partitioning in the first phase, in which unqualified objects are discarded and the processed objects are partitioned by their angles to the origin.In the second phase, local skyline objects in each partition are calculated in parallel, and global skyline objects are output after a merging skyline process. To improve the parallel local skyline calculation, we propose two partition-aware filtering methods that keep skyline candidates in a balanced manner. The aggressive partition-aware filtering aggressively eliminates objects in the partition with the greatest population of candidate objects, whereas the proportional partition-aware filtering slows down the growth of partition population proportionally.Recognizing the lack of studies that incorporate the MapReduce framework into parallel skyline processing, we propose a partial-presort grid-based partition skyline algorithm that is able to significantly improve the merging skyline computation on large datasets. The presort process can be completed in the shuffle phase with little overhead. Our experimental results show the efficiency and effectiveness of the proposed parallel skyline solution utilizing MapReduce on large-scale datasets.

Full Text