Recently, the skyline query has attracted interest in a wide range of applications from recommendation systems to computer networks. The skyline query is useful to obtain the dominant data points from the given dataset. In the low-dimensional dataset, the skyline query may return a small number of skyline points. However, as the dimensionality of the dataset increases, the number of skyline points also increases. In other words, depending on the data distribution and dimensionality, most of the data points may become skyline points. With the emergence of big data applications, where the data distribution and dimensionality are a significant problem, obtaining representative skyline points among resulting skyline points is necessary. There have been several methods that focused on extracting representative skyline points with various success. However, existing methods have a problem of re-computation when the global threshold changes. Moreover, in certain cases, the resulting representative skyline points may not satisfy a user with multiple preferences. Thus, in this paper, we propose a new representative skyline query processing method, called representative skyline cluster (RSC), which solves the problems of the existing methods. Our method utilizes the hierarchical agglomerative clustering method to find the exact representative skyline points, which enable us to reduce the re-computation time significantly. We show the superiority of our proposed method over the existing state-of-the-art methods with various types of experiments.
Read full abstract