Approximate Continuous Top-K Queries over Memory Limitation-Based Streaming Data

Rui Zhu,Liu Meng,Bin Wang,Xiaochun Yang,Xiufeng Xia

doi:10.1007/978-3-031-00123-9_1

Abstract

Continuous top-k query over sliding window is a fundamental problem over data stream. It retrieves k objects with the highest scores when the window slides. Existing efforts include exact-based algorithms and approximate-based algorithms. Their common idea is maintaining a small subset of objects in the window. When the window slides, query results could be found from this set as much as possible. However, the space cost of all existing efforts is high, i.e., linear to the scale of objects in the window, cannot work under memory limitation-based streaming data, i.e., a general environment in real applications.In this paper, we define a novel query named \(\rho -\)approximate continuous top-k query, which returns error-bounded answers to the system. Here, \(\rho \) is a threshold, used for bounding the score ratio between approximate and exact results. In order to support \(\rho -\)approximate continuous top-k query, we propose a novel framework named \(\rho -\) TOPK. It can self-adaptively adjust \(\rho \) based on the distribution of streaming data, and achieve the goal of supporting \(\rho -\)approximate continuous top-k query over memory limitation-based streaming data. Theoretical analysis indicates that even in the worsst case, both running cost and space cost of \(\rho -\) TOPK are all unrelated with data scale.KeywordsData streamContinuous top-k queryApproximateMemory limitation

Full Text