Abstract
We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time \(T-1\). We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time \(T-1\). Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have