Exact Trajectory Similarity Search With N-tree: An Efficient Metric Index for kNN and Range Queries

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Similarity search is the problem of finding in a collection of objects those that are similar to a given query object. It is a fundamental problem in modern applications and the objects considered may be as diverse as locations in space, text documents, images, X (formerly known as Twitter) messages, or trajectories of moving objects. In this article, we are motivated by the latter application. Trajectories are recorded movements of mobile objects such as vehicles, animals, public transportation, or parts of the human body. We propose a novel distance function called DistanceAvg to capture the similarity of such movements. To be practical, it is necessary to provide indexing for this distance measure. Fortunately we do not need to start from scratch. A generic and unifying approach is metric space, which organizes the set of objects solely by a distance (similarity) function with certain natural properties. Our function DistanceAvg is a metric. Although metric indexes have been studied for decades and many such structures are available, they do not offer the best performance with trajectories. In this article, we propose a new design, which outperforms the best existing indexes for kNN queries and is equally good for range queries. It is especially suitable for expensive distance functions as they occur in trajectory similarity search. In many applications, kNN queries are more practical than range queries as it may be difficult to determine an appropriate search radius. Our index provides exact result sets for the given distance function.

Similar Papers
  • Research Article
  • Cite Count Icon 3
  • 10.1007/s10586-015-0515-z
Similarity (range and kNN) queries processing on an Intel Xeon Phi coprocessor
  • Jan 6, 2016
  • Cluster Computing
  • Carlos M Toledo + 2 more

Nowadays, the evolution of information technologies requires fast similarity search tools for analyzing new data types as audio, video, or images. The usual search by keys or records is not possible and to search on these databases is a compute-intensive problem. Regarding this, in the latest years, compute-intensive coprocessors (mainly NVIDIA GPUs) have been studied as a tool for accelerating sequential processing algorithms. In this work, we implement kNN and range queries on the recently launched Intel Xeon Phi coprocessor. We developed exhaustive and also indexing algorithms using the LC index. This index has been widely studied in sequential computing to accelerate similarity search on multimedia databases. We implement and compare different exhaustive and indexing versions showing some key factors in Xeon Phi to deal with this type of search. For indexing algorithms, we used a strategy based on cluster distribution among cores LC MIC Dist-C obtaining up to 168$$\times $$× over the sequential exhaustive algorithm. Our algorithms using exhaustive strategies in Xeon Phi for range queries achieve up to 22$$\times $$× speed-up over the sequential counterpart compared to the 12$$\times $$× of a 20-core machine, and a similar advantage is achieved for kNN queries. Comparing with GPUs, we obtain higher performance on our indexing algorithms on Intel Xeon Phi. However, GPU works faster with memory-aligned access exhaustive algorithms. Our exhaustive approaches on Xeon Phi can be used on a wide class of databases, for example, non-metric spaces. Finally, we extend our algorithms to be used with large databases that do not fit in the coprocessor memory, showing a good scalability with the number of elements.

  • Conference Article
  • Cite Count Icon 21
  • 10.5555/1182635.1164182
Similarity search: a matching based approach
  • Sep 1, 2006
  • Anthony K H Tung + 3 more

Similarity search is a crucial task in multimedia retrieval and data mining. Most existing work has modelled this problem as the nearest neighbor (NN) problem, which considers the distance between the query object and the data objects over a fixed set of features. Such an approach has two drawbacks: 1) it leaves many partial similarities uncovered; 2) the distance is often affected by a few dimensions with high dissimilarity. To overcome these drawbacks, we propose the k-n-match problem in this paper.The k-n-match problem models similarity search as matching between the query object and the data objects in n dimensions, where n is a given integer smaller than dimensionality d and these n dimensions are determined dynamically to make the query object and the data objects returned in the answer set match best. The k-n-match query is expected to be superior to the kNN query in discovering partial similarities, however, it may not be as good in identifying full similarity since a single value of n may only correspond to a particular aspect of an object instead of the entirety. To address this problem, we further introduce the frequent k-n-match problem, which finds a set of objects that appears in the k-n-match answers most frequently for a range of n values. Moreover, we propose search algorithms for both problems. We prove that our proposed algorithm is optimal in terms of the number of individual attributes retrieved, which is especially useful for information retrieval from multiple systems. We can also apply the proposed algorithmic strategy to achieve a disk based algorithm for the (frequent) k-n-match query. By a thorough experimental study using both real and synthetic data sets, we show that: 1) the k-n-match query yields better result than the kNN query in identifying similar objects by partial similarities; 2) our proposed method (for processing the frequent k-n-match query) outperforms existing techniques for similarity search in terms of both effectiveness and efficiency.

  • Dissertation
  • 10.4225/03/58b607caf344c
Range and region query processing in spatial databases
  • Feb 28, 2017
  • Kefeng Xuan

With the boom of spatial databases, more and more spatial queries, which play a significant role in many academic and industrial areas, are proposed and studied extensively in last decade. One of the most fundamental queries among these is range search which returns all objects of interest within the pre-defined area. Because of the importance of the spatial queries, a mass of researches concentrated on processing various queries in spatial databases, especially, for k nearest neighbors (kNN) queries and its variations. However, as the fundamental query in spatial databases, range search queries have received far less attention. The existing works cannot process range queries efficiently, especially, in non-Euclidean space or on moving objects. Furthermore, the existing works for spatial queries retrieve point object only, none of them can find non-point objects, due to the difficulties of representing and indexing such objects in spatial databases. Motivated by above outstanding problems, we discuss several novel range and region queries and provide efficient solutions in spatial databases in this thesis. The following paragraphs describe our contribution. In the first part, we present several algorithms to process point-expected range queries that retrieve spatial objects within a specific distance from a query point. We are the first to investigate range queries under many different practical constraints. We conduct theoretical analysis to show the precise and effectiveness of our algorithms. The extensive experimental results provide the practical evidence for our theoretical analysis. Then we discuss point-expected range queries in a dynamic circumstance, where the query or the objects of interest are moving continuously. Our experiment results demonstrate that our approach outperforms the existing techniques in most instances. Thereinto, our algorithms of constrained range queries are base-on Voronoi expansion rather than incremental expansion methods, thus the response time and I/O access of range query is much faster than using exiting works. Meanwhile they queries diversify the range queries in spatial databases and solved many novel queries in spatial databases. Our algorithms designed for monitoring range queries involving any moving objects reduce the computation and communication cost significantly comparing with others. In the second part, we propose a new class of range queries, named, region-expected range queries, which find an (some) area(s) according to the location of the given set of objects. Because with the extremely progress of geographic information system, the typical queries in spatial databases cannot fulfill the users' requirements. In this part, we focus on two queries in this class, namely, kNN region queries and optimum region queries. We are the first to study this sort of range queries in spatial databases. We provide two algorithms for each query, and analyze their performances based on abundant theoretical illation and extensive experiment results. We are the first to investigate retrieving non-point objects in spatial databases. This sorts of spatial queries provide rich functionality in industrial and commercial areas, including, geographic information systems, decision support systems and so forth.

  • Research Article
  • Cite Count Icon 25
  • 10.1007/s00778-020-00619-4
Scalable data series subsequence matching with ULISSE
  • Jul 4, 2020
  • The VLDB Journal
  • Michele Linardi + 1 more

Data series similarity search is an important operation, and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range). Our contribution is twofold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk-based index visits and in-memory sequential scans. Our approach supports non-Z-normalized and Z-normalized sequences and can be used with no changes with both Euclidean distance and dynamic time warping, for answering both k-NN and $$\epsilon $$ -range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/bigdata.2015.7363948
Indexing historical spatio-temporal data in the cloud
  • Oct 1, 2015
  • Chong Zhang + 3 more

With the development of various Cloud platforms, providing spatio-temporal database services is an essential requirement for many applications, e.g., location-based services in Cloud. However, many previous works on processing queries in distributed environment fail to apply to spatio-temporal queries which is a significant role in spatio-temporal database. In this paper, we propose an efficient and scalable index for answering spatio-temporal queries in the Cloud. The index is a peer-to-peer-based overlay network, which is composed of two ring-structured overlays: spatial ring globally indexing spatial dimension and temporal ring for the temporal one. And it is featured by cost-aware function, i.e., a query is always able to be accomplished at a low cost, utilized by histograms keeping the distributions of Cloud nodes and data, maintained by each Cloud node. Both range query and kNN query can benefit from the mechanism, additionally, an elaborate algorithm for kNN processing is proposed, with which Cloud nodes can deliberatively send messages only to the result-related destinations. Furthermore, optimizations are also proposed for achieving low cost index maintenance and scalable kNN query processing. Experiments on both synthetic and real dataset show that our index is capable to support efficient and scalable range and kNN query, even for a skewed distribution.

  • Peer Review Report
  • Cite Count Icon 53
  • 10.7554/elife.17086.021
Author response: A map of abstract relational knowledge in the human hippocampal–entorhinal cortex
  • Feb 6, 2017
  • Mona M Garvert + 2 more

The hippocampal–entorhinal system encodes a map of space that guides spatial navigation. Goal-directed behaviour outside of spatial navigation similarly requires a representation of abstract forms of relational knowledge. This information relies on the same neural system, but it is not known whether the organisational principles governing continuous maps may extend to the implicit encoding of discrete, non-spatial graphs. Here, we show that the human hippocampal–entorhinal system can represent relationships between objects using a metric that depends on associative strength. We reconstruct a map-like knowledge structure directly from a hippocampal–entorhinal functional magnetic resonance imaging adaptation signal in a situation where relationships are non-spatial rather than spatial, discrete rather than continuous, and unavailable to conscious awareness. Notably, the measure that best predicted a behavioural signature of implicit knowledge and blood oxygen level-dependent adaptation was a weighted sum of future states, akin to the successor representation that has been proposed to account for place and grid-cell firing patterns.DOI: http://dx.doi.org/10.7554/eLife.17086.001

  • Conference Article
  • Cite Count Icon 5
  • 10.1137/1.9781611972788.57
On Indexing High Dimensional Data with Uncertainty
  • Apr 24, 2008
  • Charu C Aggarwal + 1 more

In this paper, we will examine the problem of distance function computation and indexing uncertain data in high dimensionality for nearest neighbor and range queries. Because of the inherent noise in uncertain data, traditional distance function measures such as the Lq-metric and their probabilistic variants are not qualitatively effective. This problem is further magnified by the sparsity issue in high dimensionality. In this paper, we examine methods of computing distance functions for high dimensional data which are qualitatively effective and friendly to the use of indexes. In this paper, we show how to construct an effective index structure in order to handle uncertain similarity and range queries in high dimensionality. Typical range queries in high dimensional space use only a subset of the ranges in order to resolve the queries. Furthermore, it is often desirable to run similarity queries with only a subset of the large number of dimensions. Such queries are difficult to resolve with traditional index structures which use the entire set of dimensions. We propose query-processing techniques which use effective search methods on the index in order to compute the final results. We discuss the experimental results on a number of real and synthetic data sets in terms of effectiveness and efficiency. We show that the proposed distance measures are not only more effective than traditional Lq-norms, but can also be computed more efficiently over our proposed index structure.

  • Conference Article
  • Cite Count Icon 27
  • 10.1109/icde.2008.4497589
On High Dimensional Indexing of Uncertain Data
  • Apr 1, 2008
  • Charu C Aggarwal + 1 more

In this paper, we will examine the problem of distance function computation and indexing uncertain data in high dimensionality for nearest neighbor and range queries. Because of the inherent noise in uncertain data, traditional distance function measures such as the Lq-metric and their probabilistic variants are not qualitatively effective. This problem is further magnified by the sparsity issue in high dimensionality. In this paper, we examine methods of computing distance functions for high dimensional data which are qualitatively effective and friendly to the use of indexes. In this paper, we show how to construct an effective index structure in order to handle uncertain similarity and range queries in high dimensionality. Typical range queries in high dimensional space use only a subset of the ranges in order to resolve the queries. Furthermore, it is often desirable to run similarity queries with only a subset of the large number of dimensions. Such queries are difficult to resolve with traditional index structures which use the entire set of dimensions. We propose query-processing techniques which use effective search methods on the index in order to compute the final results. We discuss the experimental results on a number of real and synthetic data sets in terms of effectiveness and efficiency. We show that the proposed distance measures are not only more effective than traditional Lq -norms, but can also be computed more efficiently over our proposed index structure.

  • Conference Article
  • Cite Count Icon 24
  • 10.1145/3469830.3470892
SPRIG: A Learned Spatial Index for Range and kNN Queries
  • Aug 23, 2021
  • Songnian Zhang + 3 more

A corpus of recent work has revealed that the learned index can improve query performance while reducing the storage overhead. It potentially offers an opportunity to address the spatial query processing challenges caused by the surge in location-based services. Although several learned indexes have been proposed to process spatial data, the main idea behind these approaches is to utilize the existing one-dimensional learned models, which requires either converting the spatial data into one-dimensional data or applying the learned model on individual dimensions separately. As a result, these approaches cannot fully utilize or take advantage of the information regarding the spatial distribution of the original spatial data. To this end, in this paper, we exploit it by using the spatial (multi-dimensional) interpolation function as the learned model, which can be directly employed on the spatial data. Specifically, we design an efficient SPatial inteRpolation functIon based Grid index (SPRIG) to process the range and kNN queries. Detailed experiments are conducted on real-world datasets. The results indicate that, compared to the traditional spatial indexes, our proposed learned index can significantly improve the index building and query processing performance with less storage overhead. Moreover, in the best case, our index achieves up to an order of magnitude better performance than ZM-index in range queries and is about 2.7 × , 3 × , and 9 × faster than the multi-dimensional learned index Flood in terms of index building, range queries, and kNN queries, respectively.

  • Research Article
  • Cite Count Icon 18
  • 10.1007/s11276-012-0479-3
Spatial query processing in road networks for wireless data broadcast
  • Jul 6, 2012
  • Wireless Networks
  • Yanqiu Wang + 4 more

Recently, wireless broadcast environments have attracted significant attention due to its high scalability to broadcast information to a large number of mobile subscribers. It is especially a promising and desirable dissemination method for the heavily loaded environment where a great number of the same type of requests are sent from the users. There have been many studies on processing spatial queries via broadcast model recently. However, not much attention is paid to the spatial queries in road networks on wireless broadcast environments. In this paper, we focus on three common types of spatial queries, namely, k nearest neighbor (kNN) queries, range queries and reverse nearest neighbor (RNN) queries in spatial networks for wireless data broadcast. Specially, we propose a novel index for spatial queries in wireless broadcast environments (ISW). With the reasonable organization and the effectively pre-computed bounds, ISW provides a powerful framework for spatial queries. Furthermore, efficient algorithms are designed to cope with kNN, range and RNN queries separately based on ISW. The search space can be obviously reduced and subsequently the client can download as less as possible data for query processing, which can conserve the energy while not significantly influence the efficiency. The detailed theory analysis of cost model and the experimental results are presented for verifying the efficiency and effectiveness of ISW and our methods.

  • Research Article
  • Cite Count Icon 21
  • 10.1109/tkde.2007.190700
Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding
  • Jan 1, 2008
  • IEEE Transactions on Knowledge and Data Engineering
  • Lei Chen + 1 more

Similarity-based search has been a key factor for many applications such as multimedia retrieval, data mining, Web search and retrieval, and so on. There are two important issues related to the similarity search, namely, the design of a distance function to measure the similarity and improving the search efficiency. Many distance functions have been proposed, which attempt to closely mimic human recognition. Unfortunately, some of these well-designed distance functions do not follow the triangle inequality and are therefore nonmetric. As a consequence, efficient retrieval by using these nonmetric distance functions becomes more challenging, since most existing index structures assume that the indexed distance functions are metric. In this paper, we address this challenging problem by proposing an efficient method, that is, local constant embedding (LCE), which divides the data set into disjoint groups so that the triangle inequality holds within each group by constant shifting. Furthermore, we design a pivot selection approach for the converted metric distance and create an index structure to speed up the retrieval efficiency. Moreover, we also propose a novel method to answer approximate similarity search in the nonmetric space with a guaranteed query accuracy. Extensive experiments show that our method works well on various nonmetric distance functions and improves the retrieval efficiency by an order of magnitude compared to the linear scan and existing retrieval approaches with no false dismissals.

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/ssdbm.2007.11
Efficient Approximation of Spatial Network Queries using the M-Tree with Road Network Embedding
  • Jul 1, 2007
  • Kevin Shaw + 4 more

Spatial networks, such as road systems, operate differently from normal geospatial systems because objects are constrained to locations on the network. Performing queries on spatial networks demands entirely different solutions. Most spatial queries make use of an R-Tree to process them efficiently. The M-Tree is a data tree index which is capable of indexing data in any metric space. The M-Tree index can replace the R-Tree index for spatial network queries, such as range and KNN queries. The difficulty is that the M-Tree is only as efficient as the distance algorithm used on the underlying objects. Most network distance algorithms, such as A*, are too slow to allow the M-Tree to operate efficiently on spatial networks. The truncated road network embedding (tRNE) maps the network into a higher dimensional space where any LP metric can be used to efficiently compute an accurate approximation of network distance. The M-Tree combined with tRNE creates an efficient index structure for computing spatial network queries. The M-Tree substantially outperforms network expansion, the most popular method of computing spatial network queries, when performing spatial network KNN and range queries.

  • Research Article
  • Cite Count Icon 36
  • 10.14778/1687627.1687630
Similarity search on Bregman divergence
  • Aug 1, 2009
  • Proceedings of the VLDB Endowment
  • Zhenjie Zhang + 3 more

In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key properties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space. Subsequently, we show how state-of-the-art tree-based indexing methods, for low to moderate dimensional datasets, and vector approximation file (VA-file) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bounding techniques and distribution-based index optimization are also introduced to improve the performance of query answering and index construction respectively, which can be applied on both the R-trees and VA files. Extensive experiments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions.

  • Conference Article
  • Cite Count Icon 21
  • 10.1109/icdm.2004.10082
Efficient Density-Based Clustering of Complex Objects
  • Nov 1, 2004
  • S Brecheisen + 2 more

Nowadays, data mining in large databases of complex objects from scientific, engineering or multimedia applications is getting more and more important. In many different application domains, complex object representations along with complex distance functions are used for measuring the similarity between objects. Often, not only these complex distance measures are available but also simpler distance functions which can be computed much more efficiently. Traditionally, the well known concept of multi-step query processing which is based on exact and lower-bounding approximative distance functions are used independently of data mining algorithms. In this paper, we demonstrate how the paradigm of multi-step query processing can be integrated into the two density-based clustering algorithms DBSCAN and OPTICS resulting in a considerable efficiency boost. Our approach tries to confine itself to /spl epsiv/-range queries on the simple distance functions and carries out complex distance computations only at that stage of the clustering algorithm where they are compulsory to compute the correct clustering result. In a broad experimental evaluation based on real-world test data sets, we demonstrate that our approach accelerates the generation of flat and hierarchical density-based clusterings by more than one order of magnitude.

  • Research Article
  • Cite Count Icon 2
  • 10.1051/jp1:1993238
Decay of long-ranlye field fluctuations induced by random structures: a unified spectral approach
  • Nov 1, 1993
  • Journal de Physique I
  • Didier Sornette

The problem of the determination of the powerlaw decay of the standard deviation σ 2 ∼z -α of the fluctuations of the field generated by a random array of elements (multipoles, ensemble of dislocations, etc.) as a function of the distance z from the array is reduced to the determination of two quantities: 1) the spectral power of the disorder in the low k limit and 2) the structure of the Green function, as a function of wavenumber and distance, for a periodic array of the constituting elements. We thus recover straightforwardly all results known previously and derive new ones for more general constitutive elements. The general expression of the decay exponent is found to be α=3-β+2(b-c), where β characterizes the self-affine structure of the disorder (β=2: strong disorder and β=0: weak disorder) and the exponents b and c are the exponents of the algebraic powerlaw corrections, in wavenumber and distance respectively, to the dominating exponential decay of the Green function for a periodic array of the constituting elements. The proposed spectral method solves automatically the generally difficult problem of renormalization and screening in arbitrary random structures

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.