A top-k query returns k tuples with the highest (or the lowest) scores from a relation. Layer-based methods are the representative ones for processing top-k queries efficiently. These methods construct a list of layers, where the ith layer contains the tuples that can potentially be the top-i answer. Thus, the layer-based methods can answer top-k queries by reading at most k layers. To construct layers, the existing layer-based methods use convex skyline, convex hull or skyline methods. Among them, the convex skyline is constructed by computing the convex hull over the skyline. Accordingly, the layer size of the convex skyline is relatively smaller than those of the convex hull, and the index building time is relatively shorter than those of the skyline. However, for large and high-dimensional databases, the convex skyline suffers from long index building time and large memory usage, because most objects can become the skyline points. This paper focuses on how to build an index, which contains a smaller number of objects comparing to the skyline and uses less time to construct an index comparing to the convex skyline. Specifically, we propose a method, called the Approximate Convex Skyline Enhanced (simply, AppCSE), which reduces the index building time and memory usage of the convex skyline. In the proposed method, we first construct the skyline, and then, partition the region of the skyline into multiple subregions, and compute the convex hull in each subregion with virtual objects. After that, AppCSE combines the objects obtained by computing the convex hull. Through various experiments with synthetic and real datasets, we demonstrate that the proposed method significantly reduces the index building time and memory usage comparing to the existing methods. In addition, we show that the degradation of query performance is negligible when using AppCSE as the layering scheme.
Read full abstract