FIFO indexes for decomposable problems
This paper studies first-in-first-out (FIFO) indexes, each of which manages a dataset where objects are deleted in the same order as their insertions. We give a technique that converts a static data structure to a FIFO index for all decomposable problems, provided that the static structure can be constructed efficiently. We present FIFO access methods to solve several problems including half-plane search, nearest neighbor search, and extreme-point search. All of our structures consume linear space, and have optimal or near-optimal query cost.
- Conference Article
- 10.1109/iccvw.2009.5457540
- Sep 1, 2009
Nearest Neighbor (NN) search plays important roles in Computer Vision algorithms. Especially, NN search on immensely large amount of image data set stored on the Internet is getting highlighted. For dealing with such huge data, main memory of a single PC is insufficient. As a solution, we propose an approximate NN search on hard disk drive (HDD) in this paper. This algorithm is based on recently proposed Principal Component Hashing (PCH). In our algorithm “PCH on HDD” (PCHD), the hash bins are represented by the leaf nodes of B+ tree for dealing with the dynamic addition and deletion of the data. Of course, the search time is slower than the original PCH. However, we found some advantages of this approach through the experiments using standard PC and 10000 stored images: 1) the memory consumption is 42 times smaller, 2) the first search time including the cold start-up time is 4.3 times faster (PCH:31.8[s], PCHD: 7.4[s]), 3) and interestingly, the successive searches are accelerated owing to the cache mechanism embedded in the operating system (mean search time decreases from 7.4[s] to 0.64[s]). We also confirmed that our algorithm performs NN search on 1 million image datasets with only 193MB memory consumption; however, PCH cannot, because of the huge memory consumption. These properties reveal that this algorithm is suitable for non-time-critical NN search applications and NN search engine called by web servers, where the search engine starts up in response to occasional queries.
- Book Chapter
- 10.1007/978-3-540-70575-8_7
- Jan 1, 2008
We study the nondeterministic cell-probe complexity of static data structures. We introduce cell-probe proofs (CPP), a proof system for the cell-probe model, which describes verifications instead of computations in the cell-probe model. We present a combinatorial characterization of CPP. With this novel tool, we prove the following lower bounds for the nondeterministic cell-probe complexity of static data structures: We show that there exist data structure problems which have super-constant nondeterministic cell-probe complexity. In particular, we show that for the exact nearest neighbor search (NNS) problem or the partial match problem in high dimensional Hamming space, there does not exist a static data structure with Poly(n) cells, each of which contains no(1)bits, such that the nondeterministic cell-probe complexity is O(1), where nis the number of points in the data set for the NNS or partial match problem. For the polynomial evaluation problem, if single-cell nondeterministic probes are sufficient, then either the size of a single cell is close to the size of the whole polynomial, or the total size of the data structure is close to that of a naive data structure that stores results for all possible queries.
- Research Article
124
- 10.1145/1806907.1806912
- Jul 1, 2010
- ACM Transactions on Database Systems
Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders of magnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial reduction in the space and running time. In our experiments, our technique was faster: (i) than distance browsing (a well-known method for solving the problem exactly) by several orders of magnitude, and (ii) than D-shift (an approximate approach with theoretical guarantees in low-dimensional space) by one order of magnitude, and at the same time, outputs better results.
- Research Article
- 10.23887/jstundiksha.v12i3.67809
- Jan 22, 2024
- JST (Jurnal Sains dan Teknologi)
VRP distributions have had difficulty overcoming the problem of finding channels with minimal depots to locations that have different places with different total demand. The purpose of this study is to analyze the problem of transportation routes in the distribution of products obtained from the initial location of distribution to users. This type of research is qualitative research. This research was conducted at PT. Nusa Persada Concrete Creations. The Nearest Neighbor method is used to determine the distribution of routes. The Local Search method is carried out to evaluate and improve the distribution of routes carried out at the beginning with the Nearest Neighbors method. The data analysis process consists of several stages with the Nearest Neighbor method and the LocalSearch method. The results of the study, namely the Model Vehicle Routing Problem (VRP) applied in determining ready mix delivery routes at PT. Nusapersada Concrete Creation using nearest and local neighbor methods. Vehicle Routing Problem (VRP) models using nearest and local neighbor methods can be used applied in determining ready mix delivery routes to limited companies. Nusapersada Concrete Creations. This makes distance and time more effective, as well as more cost efficient. New routes generated This is a route improvement solution that PT. The application of the Nusapersada Concrete Creations model results in a new route that reduces the distance closer, faster completion time, and fuel cost savings for truck vehicles compared to the initial route. This makes distance and time more effective, as well as more cost efficient.
- Conference Article
16
- 10.1145/2505515.2505522
- Oct 27, 2013
Nearest neighbor proximity search in large graphs is an important analysis primitive with a variety of applications in graph data from different domains. We propose a novel proximity measure for weighted graphs called Effective Importance which incorporates multiple paths between nodes and captures the inherent structural clusters within a network. We develop effective bounds on the EI value using a modified small subnetwork around a query node, enabling scalable exact nearest neighbor (NN) search at query time. Our NN search does not require heavy offline analysis or holistic knowledge of the graph, making our method suitable for very large dynamically changing networks or composite network overlays.We employ our NN search algorithm on social, information and biological networks and demonstrate the effectiveness and scalability of the approach. For million-node networks, our method retrieves the exact top 20 neighbors using less than $0.2%$ of the network edges in a fraction of a second on a conventional desktop machine. We also evaluate the effectiveness of our proximity measure and NN search for three applications, namely (i) finding good local clusters, (ii) network sparsification and (iii) prediction of node attributes in information networks. The EI measure and NN search method outperform recent counterparts from the literature in all applications.
- Dissertation
- 10.5353/th_b4784954
- Jan 1, 2012
Nearest Neighbor (NN in short) queries are important in emerging applications,\n\nsuch as wireless networks, location-based services, and data stream applications,\n\nwhere the data obtained are often imprecise. The imprecision or imperfection of\n\nthe data sources is modeled by uncertain data in recent research works. Handling\n\nuncertainty is important because this issue affects the quality of query answers.\n\nAlthough queries on uncertain data are useful, evaluating the queries on them can\n\nbe costly, in terms of I/O or computational efficiency. In this thesis, we study how\n\nto efficiently evaluate NN queries on uncertain data.\n\n\n\nGiven a query point q and a set of uncertain objects O, the possible nearest neighbor query returns a set of candidates which have non-zero probabilities to be the\n\nquery answer. It is also interesting to ask \\which region has the same set of possible nearest neighbors", and \\which region has one specific object as its possible\n\nnearest neighbor". To reveal the relationship between the query space and nearest\n\nneighbor answers, we propose the UV-diagram, where the query space is split into\n\ndisjoint partitions, such that each partition is associated with a set of objects. If a\n\nquery point is located inside the partition, its possible nearest neighbors could be\n\ndirectly retrieved. However, the number of such partitions is exponential and the\n\nconstruction effort can be expensive. To tackle this problem, we propose an alternative concept, called UV-cell, and efficient algorithms for constructing it. The UV-cell has an irregular shape, which incurs difficulties in storage, maintenance,\n\nand query evaluation. We design an index structure, called UV-index, which is\n\nan approximated version of the UV-diagram. Extensive experiments show that\n\nthe UV-index could efficiently answer different variants of NN queries, such as\n\nProbabilistic Nearest Neighbor Queries, Continuous Probabilistic Nearest Neighbor\n\nQueries.\n\n\n\nAnother problem studied in this thesis is the trajectory nearest neighbor query.\n\nHere the query point is restricted to a pre-known trajectory. In applications (e.g.\n\nmonitoring potential threats along a flight/vessel's trajectory), it is useful to derive\n\nnearest neighbors for all points on the query trajectory. Simple solutions, such as\n\nsampling or approximating the locations of uncertain objects as points, fails to\n\nachieve a good query quality. To handle this problem, we design efficient algorithms\n\nand optimization methods for this query. Experiments show that our solution can\n\nefficiently and accurately answer this query. Our solution is also scalable to large\n\ndatasets and long trajectories.
- Conference Article
2
- 10.1109/iccsec.2017.8447050
- Dec 1, 2017
For a given set of moving objects and a k nearest neighbor query q, the processing of Continuous K Nearest Neighbor (CKNN) query refers to search the k nearest objects for q and continuously monitor its result in real-time with the objects and the query point moving. Most existing works about processing CKNN queries usually exist some flaws about the index maintenance, real-time updates of results, and the query cost, which makes them hardly can perfectly settle this issue. To address this challenge, we propose an incremental search algorithm to handle CKNN queries over a tremendous volume of moving objects with a Random Estimate method. In particularly, our approach adopts the grid index to maintain the moving objects in real-time. For a given query q, IS-CKNN first employs YPK-CNN algorithm to compute the initial result of q. Next, it designs the Random Estimation (RE) method, to rapidly estimate an appropriate search region that guarantees covering k nearest neighbors of q based on its previous search scope. This strategy can immediately compute the appropriate search space for the moving query without iteratively enlarging the search region, which can greatly enhance the search efficiency. Finally, we conduct extensive experiments to fully evaluate the performance of our proposal.
- Conference Article
51
- 10.23919/date51398.2021.9474025
- Feb 1, 2021
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing $L$ ∞ and Hamming distance metrics, but they cannot achieve software-comparable accuracies. This paper proposes a novel distance function that can be natively evaluated with multi-bit content-addressable memories (MCAMs) based on ferroelectric FETs (Fe-FETs) to perform a single-step, in-memory NN search. Moreover, this approach achieves accuracies comparable to floating-point precision implementations in software for NN classification and one/few-shot learning tasks. As an example, the proposed method achieves a 98.34% accuracy for a 5-way, 5-shot classification task for the Omniglot dataset (only 0.8% lower than software-based implementations) with a 3-bit MCAM. This represents a 13% accuracy improvement over state-of-the-art TCAM-based implementations at iso-energy and iso-delay. The presented distance function is resilient to the effects of FeFET device-to-device variations. Furthermore, this work experimentally demonstrates a 2-bit implementation of FeFET MCAM using AND arrays from GLOBALFOUNDRIES to further validate proof of concept.
- Research Article
12
- 10.1109/tpami.2019.2925347
- Jun 27, 2019
- IEEE transactions on pattern analysis and machine intelligence
Nearest neighbor search is a fundamental problem in computer vision and machine learning. The straightforward solution, linear scan, is both computationally and memory intensive in large scale high-dimensional cases, hence is not preferable in practice. Therefore, there have been a lot of interests in algorithms that perform approximate nearest neighbor (ANN) search. In this paper, we propose a novel addition-based vector quantization algorithm, Asymmetric Mapping Quantization (AMQ), to efficiently conduct ANN search. Unlike existing addition-based quantization methods that suffer from handling the problem caused by the norm of database vector, we map the query vector and database vector using different mapping functions to transform the computation of L-2 distance to inner product similarity, thus do not need to evaluate the norm of database vector. Moreover, we further propose Distributed Asymmetric Mapping Quantization (DAMQ) to enable AMQ to work on very large dataset by distributed learning. Extensive experiments on approximate nearest neighbor search and image retrieval validate the merits of the proposed AMQ and DAMQ.
- Conference Article
3
- 10.1109/icsmc.2006.385263
- Oct 1, 2006
Broadcasting data with an index is an effective way to disseminate public information to a large clients. For a server, using multiple channels to provide services (e.g., location-based services) makes the broadcast cycle shorter than using one channel. Among location-based services, the k nearest neighbors (k-NN) search is an important one and finds the fc closest objects to a query point in a multi-dimensional space. This paper considers k nearest neighbors search on a broadcast R-tree in a multi-channel environment. We assume that a mobile client can only tune into a specified channel at one time instance. We study how a server generates the broadcast schedules on multiple channels and explore how a client executes the k-NN search on the broadcast. Different broadcast schedules with the client k-NN search processing makes different k-NN search protocols. The objectives of the protocols is to minimize the latency (i.e., the time elapsed between issuing and termination of the query), tuning time (i.e., the amount of time spent on listening to the channel), and the memory usage for k-NN search processing. Last, we present our experiments and the experiment results validate that our mechanisms achieve the objectives.
- Research Article
12
- 10.1016/j.neucom.2015.11.104
- May 12, 2016
- Neurocomputing
HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search
- Research Article
16
- 10.1016/j.jnca.2014.05.010
- Jun 10, 2014
- Journal of Network and Computer Applications
Scalable nearest neighbor query processing based on Inverted Grid Index
- Book Chapter
4
- 10.1137/1.9781611975994.173
- Jan 1, 2020
We study the k nearest neighbors problem in the plane for general, convex, pairwise disjoint sites of constant description complexity such as line segments, disks, and quadrilaterals and with respect to a general family of distance functions including the Lp-norms and additively weighted Euclidean distances. For point sites in the Euclidean metric, after four decades of effort, an optimal data structure has recently been developed with O(n) space, O(log n + k) query time, and O(n log n) preprocessing time [1, 17]. We develop a static data structure for the general setting with nearly optimal O(n log log n) space, the optimal O(log n + k) query time, and expected O(n polylog n) preprocessing time. The O(n log log n) space approaches the linear space, whose achievability is still unknown with the optimal query time, and improves the so far best O(n(log2 n)(log log n)2) space of Bohler et al.'s work [12]. Our dynamic version (that allows insertions and deletions of sites) also reduces the space of Kaplan et al.'s work [29] from O(n log3 n) to O(n log n) while keeping O(log2 n + k) query time and O(polylog n) update time, thus improving many applications such as dynamic bichromatic closest pair and dynamic minimum spanning tree in general planar metric, and shortest path tree and dynamic connectivity in disk intersection graphs. To obtain these progresses, we devise shallow cuttings of linear size for general distance functions. Shallow cuttings are a key technique to deal with the k nearest neighbors problem for point sites in the Euclidean metric. Agarwal et al. [4] already designed linear-size shallow cuttings for general distance functions, but their shallow cuttings could not be applied to the k nearest neighbors problem. Recently, Kaplan et al. [29] constructed shallow cuttings that are feasible for the k nearest neighbors problem, while the size of their shallow cuttings has an extra double logarithmic factor. Our innovation is a new random sampling technique for the analysis of geometric structures. While our shallow cuttings seem, to some extent, merely a simple transformation of Agarwal et al.'s [4], the analysis requires our new technique to attain the linear size. Since our new technique provides a new way to develop and analyze geometric algorithms, we believe it is of independent interest.
- Research Article
27
- 10.1109/tmm.2021.3073811
- Mar 12, 2021
- IEEE Transactions on Multimedia
Nearest neighbor search and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor graph construction are two fundamental issues that arise from many disciplines such as multimedia information retrieval, data-mining, and machine learning. They become more and more imminent given the big data emerge in various fields in recent years. In this paper, a simple but effective solution both for approximate <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor search and approximate <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor graph construction is presented. These two issues are addressed jointly in our solution. On one hand, the approximate <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor graph construction is treated as a search task. Each sample along with its <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbors is joined into the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor graph by performing the nearest neighbor search sequentially on the graph under construction. On the other hand, the built <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor graph is used to support <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor search. Since the graph is built online, the dynamic update on the graph, which is not possible for most of the existing solutions, is supported. This solution is feasible for various distance measures. Its effectiveness both as <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor construction and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor search approaches is verified across different types of data in different scales, various dimensions, and under different metrics.
- Research Article
- 10.5194/isprs-archives-xlii-2-w1-69-2016
- Oct 26, 2016
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Nearest Neighbour (NN) is one of the important queries and analyses for spatial application. In normal practice, spatial access method structure is used during the Nearest Neighbour query execution to retrieve information from the database. However, most of the spatial access method structures are still facing with unresolved issues such as overlapping among nodes and repetitive data entry. This situation will perform an excessive Input/Output (IO) operation which is inefficient for data retrieval. The situation will become more crucial while dealing with 3D data. The size of 3D data is usually large due to its detail geometry and other attached information. In this research, a clustered 3D hierarchical structure is introduced as a 3D spatial access method structure. The structure is expected to improve the retrieval of Nearest Neighbour information for 3D objects. Several tests are performed in answering Single Nearest Neighbour search and k Nearest Neighbour (kNN) search. The tests indicate that clustered hierarchical structure is efficient in handling Nearest Neighbour query compared to its competitor. From the results, clustered hierarchical structure reduced the repetitive data entry and the accessed page. The proposed structure also produced minimal Input/Output operation. The query response time is also outperformed compared to the other competitor. For future outlook of this research several possible applications are discussed and summarized.