Distributed top-k similarity query on big trajectory streams

Zhigang Zhang,Cheqing Jin,Yilin Wang,Xiaodong Qi,Jiali Mao,Aoying Zhou

doi:10.1007/s11704-018-7234-6

Abstract

Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top-k similarity query, which finds k trajectories that are most similar to a given query trajectory from all remote sites, is critical in this field. The key challenge in such a query is how to reduce the communication cost due to the limited network bandwidth resource. Although this query can be solved by sending the query trajectory to all the remote sites, in which the pairwise similarities are computed precisely. However, the overall cost, O(n · m), is huge when n or m is huge, where n is the size of query trajectory and m is the number of remote sites. Fortunately, there are some cheap ways to estimate pairwise similarity, which filter some trajectories in advance without precise computation. In order to overcome the challenge in this query, we devise two general frameworks, into which concrete distance measures can be plugged. The former one uses two bounds (the upper and lower bound), while the latter one only uses the lower bound. Moreover, we introduce detailed implementations of two representative distance measures, Euclidean and DTW distance, after inferring the lower and upper bound for the former framework and the lower bound for the latter one. Theoretical analysis and extensive experiments on real-world datasets evaluate the efficiency of proposed methods.

Full Text