Abstract

Let k ≥ 0 be an integer. In the approximate k-flat nearest neighbor (k-ANN) problem, we are given a set P ⊂ Rd of n points in d-dimensional space and a fixed approximation factor c > 1. Our goal is to preprocess P so that we can efficiently answer approximate k-flat nearest neighbor queries: given a k-flat F, find a point in P whose distance to F is within a factor c of the distance between F and the closest point in P. The case k = 0 corresponds to the well-studied approximate nearest neighbor problem, for which a plethora of results are known, both in low and high dimensions. The case k = 1 is called approximate line nearest neighbor. In this case, we are aware of only one provably efficient data structure, due to Andoni, Indyk, Krauthgamer, and Nguyen (AIKN) [2]. For k ≥ 2, we know of no previous results.We present the first efficient data structure that can handle approximate nearest neighbor queries for arbitrary k. We use a data structure for 0-ANN-queries as a black box, and the performance depends on the parameters of the 0-ANN solution: suppose we have a 0-ANN structure with query time O(nρ) and space requirement O(n1+σ), for ρ, σ > 0. Then we can answer k-ANN queries in time O(nk/(k + 1 - ρ) + t)$ and space O(n1+σ k/(k + 1 - ρ) + n logO(1/t) n). Here, t > 0 is an arbitrary constant and the O-notation hides exponential factors in k, 1/t, and c and polynomials in d.Our approach generalizes the techniques of AIKN for 1-ANN: we partition P into clusters of increasing radius, and we build a low-dimensional data structure for a random projection of P. Given a query flat F, the query can be answered directly in clusters whose radius is small compared to d(F, P) using a grid. For the remaining points, the low dimensional approximation turns out to be precise enough. Our new data structures also give an improvement in the space requirement over the previous result for 1-ANN: we can achieve near-linear space and sublinear query time, a further step towards practical applications where space constitutes the bottleneck.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call