Scalable Multicore k-NN Search via Subspace Clustering for Filtering

Xiaoxin Tang,David Eyers,Steven Mills,Zhiyi Huang,Minyi Guo

doi:10.1109/tpds.2014.2372755

Abstract

$k$ Nearest Neighbors ( $k$ -NN) search is a widely used category of algorithms with applications in domains such as computer vision and machine learning. Despite the desire to process increasing amounts of high-dimensional data within these domains, $k$ -NN algorithms scale poorly on multicore systems because they hit a memory wall. In this paper, we propose a novel data filtering strategy for $k$ -NN search algorithms on multicore platforms. By excluding unlikely features during the $k$ -NN search process, this strategy can reduce the amount of computation required as well as the memory footprint. It is complementary to the data selection strategies used in other state-of-the-art $k$ -NN algorithms. A Subspace Clustering for Filtering (SCF) method is proposed to implement the data filtering strategy. Experimental results on four $k$ -NN algorithms show that SCF can significantly improve their performance on three modern multicore platforms with only a small loss of search precision.

Full Text