Abstract

The nearest neighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work on theory and algorithms, surprisingly little work has been done on algorithms for high-end computing platforms, and no open source library exists that can scale efficiently to thousands of cores. In this paper, we present algorithms and a library built on top of the message passing interface (MPI) and OpenMP that enable nearest neighbor searches to hundreds of thousands of cores for arbitrary-dimensional datasets. The library supports both exact and approximate nearest neighbor searches. The latter is based on iterative, randomized, and greedy KD-tree ($k$-dimensional tree) searches. We describe novel algorithms for the construction of the KD-tree, give complexity analysis, and provide experimental evidence for the scalability of the method. In our largest runs, we were able to perform an al...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call