An Optimization Algorithm of K-NN Classification

Yan Zhan,Guo-Chun Zhang,Hao Chen

doi:10.1109/icmlc.2006.258667

Abstract

K-nearest neighbor (K-NN) algorithm is a classification method based on statistical theory. In this algorithm the Euclidean distance is usually chosen as the similarity measure, which usually relates to all attributes. Accordingly one practical issue in applying K-NN algorithm is that the distance between instances is calculated based on all attributes of the instance. One interesting approach to overcoming this problem is to weight each attribute differently when calculating the distance between two instances. So we can decide different functions of each feature by using feature weight learning. Another issue is that we still need evaluate K value by testing different values. In order to avoid searching for K value in nearest neighbor experiment and make the accuracy and efficiency more perfect, we bring forward one validity function in this paper for judging clustering when the classification of data set is clear. We apply it into classification problem such as K-NN combining with supervised classification. Thus we can only select the nearest neighbor (1-NN) not only to achieve more precise classification but also to avoid the trouble of looking for K, which will reduce the query complexity greatly and improve the efficiency of nearest neighbor algorithm. Simultaneously, the nearest neighbor algorithm is one of the most basic case-base reasoning (CBR) problems and case-base maintenance (CBM) is an important issue in CBR system to obtain the efficient case bases. This paper proposes a new approach to select representative cases based on generalization capability of cases. Using this method, most redundant cases, which affect the solution accuracy, can be deleted. It will improve indexing efficiency in searching near neighbors

Full Text