An optimal proximity method for nearest neighbor search in high dimensional data

Vadlamudi China Venkaiah,Raghunadh Pasunuri

doi:10.1109/ic3i.2016.7918012

Abstract

Nearest Neighbor Search is the basic operation that has been used to perform similarity search in vast areas like Content-Based Image Retrieval (CBIR), Web Search Engines, Micro Array Data Analysis, Recommender Systems, and many more. In this work we propose a data partitioning method based on multiple reference points. Our method partitions the data into multiple groups based on an optimization criteria. The method works by partitioning the data into disjoint partitions based on the distance from a set of reference points to the data objects. We are able to retrieve the nearest neighbours (kNN) by searching in only a single partition group where all the nearest neighbours lie for the given query according to the distance from a reference point. We have used ZINC, AT & T (formarly ORL), Yale, GCM, Luekemia and Lung Data sets to conduct experiments. We compare the results with the similar methods like mean reference, minmax reference and set of reference points methods. We have tested these three methods with the proposed method on six data sets. From the experimental results we can say that the proposed method is giving promising and better results than the state of the art NN search methods. Proposed method works by pruning the search space and also reduces the computation cost and achieves fast search. Performance of the proposed method is compared with a group of queries and the results are promising.

Full Text