Randomized approximate nearest neighbors algorithm

Peter Wilcox Jones,Andrei Osipov,Vladimir Rokhlin

doi:10.1073/pnas.1107769108

Peter Wilcox Jones, Andrei Osipov + Show 1 more

Open Access

https://doi.org/10.1073/pnas.1107769108

Copy DOI

Abstract

We present a randomized algorithm for the approximate nearest neighbor problem in d -dimensional Euclidean space. Given N points { x j } in , the algorithm attempts to find k nearest neighbors for each of x j , where k is a user-specified integer parameter. The algorithm is iterative, and its running time requirements are proportional to T · N ·( d ·(log d ) + k ·( d + log k )·(log N )) + N · k 2 ·( d + log k ), with T the number of iterations performed. The memory requirements of the procedure are of the order N ·( d + k ). A by-product of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among { x j } for an arbitrary point . The cost of each such query is proportional to T ·( d ·(log d ) + log( N / k )· k ·( d + log k )), and the memory requirements for the requisite data structure are of the order N ·( d + k ) + T ·( d + N ). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme’s behavior for certain types of distributions of { x j } and illustrate its performance via several numerical examples.

Full Text