An improvement to k-nearest neighbor classifier

P Viswanath,T Hitendra Sarma

doi:10.1109/raics.2011.6069307

Abstract

Non-parametric methods like Nearest neighbor classifier (NNC) and its variants such as k-nearest neighbor classifier (k-NNC) are simple to use and often shows good performance in practice. It stores all training patterns and searches to find k nearest neighbors of the given test pattern. Some fundamental improvements to k-NNC are (i) weighted k-nearest neighbor classifier (wk-NNC) where a weight to each of the neighbors is given and is used in the classification, (ii) to use a bootstrapped training set instead of the given training set, etc. Hamamoto et. al. [1] has given a bootstrapping method, where a training pattern is replaced by a weighted mean of a few of its neighbors from its own class of training patterns. It is shown to improve the classification accuracy in most of the cases. The time to create the bootstrapped set is O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) where n is the number of training patterns. This paper presents a novel improvement to the k-NNC called k-Nearest Neighbor Mean Classifier (k-NNMC). k-NNMC finds k nearest neighbors for each class of training patterns separately, and finds means for each of these k neighbors (class-wise). Classification is done according to the nearest mean pattern. It is shown experimentally using several standard data-sets that the proposed classifier shows better classification accuracy over k-NNC, wk-NNC and k-NNC using Hamamoto's bootstrapped training set. Further, the proposed method does not have a design phase as the Hamamoto's method, and this is suitable for parallel implementations which can be coupled with any indexing and space reduction methods easily. It is a suitable method to be used in data mining applications.

Full Text