Abstract

Efficient processing of Top-k queries is a crucial requirement in many dynamic environments that involve massive amounts of data. Observed in many real scenario, a Top-k query often consists of two components to reflect a user's preference: selection condition and ranking function. Users can set their own sort function, simultaneously can choose to query different interesting subsets of the data. For traditional database, scholars have studied in the Top-k algorithm. However, previous works do not adapt to the situation when object attribute values is dynamic or the data amount is great. To address the problem of dynamic Top-k query, we propose a Trunk Tree Index (TTI) structure based on grid index. To remedy the lack ability of dealing these data, we describe a novel grid index method and give the GID-Top-k algorithm. The TTI index is determined according to the dominating relation between grids, and the dominating relation is further divided into father and son relation, brother relation and so on. TTI index can acquire the position of the grid subscript when there exists insert, delete and update. Furthermore, the process of pruning in Grid Index based Dynamic Top-k can effectively computing the k dominated capability through the summary of the grid in the Trunk tree Index, and minimizing the number of k dominated capability to be counted by looking for determinant units. Divide the influence area and free area according to the division of the region. Finally, pruning the data set, and if the data changes in the free area, there is no need to recalculate Top-k result. It can reduce the probability of recalculation when data changes frequently. The building blocks and calculation modules of the multi-dimensional TTI index are given, the grid dominance relationships, the relationship between brothers and sons are clarified from the two-dimensional to the multi-dimensional case with TTI. The pruning rules and calculation of free area and influence are related with the operation of searching grid, which avoid the stochastic search in grid using linked list. The cost of building and maintaining of the index is also analyzed with query efficiency. The experiment chooses different datasets to test, which compared with the baseline algorithm Top-k, RankCube and CIA algorithm separately. Because grid index is different from general index, the n dimensions on query space is divided into such as large area of the grid, under the condition of the same configuration parameters in other experiments, the distribution of data in each grid density on the algorithm performance is going to have an impaction of the presented algorithms. Experimental results verify the effectiveness of the proposed algorithm. Experimental results verify the effectiveness of the algorithm, the experimental data show that in the static case, the query efficiency of the algorithm in this paper up to eight times faster than the traditional Top-k algorithm, dynamic cases up to 10 times faster than the existing comparison objects. We will develop the improved Top-k queries to be an approach for big data analysis, and multi-criteria retrieval applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call