Improving Nearest Neighbor Classifier Using Tabu Search and Ensemble Distance Metrics

Muhammad Tahir,James Smith

doi:10.1109/icdm.2006.86

Abstract

The nearest-neighbor (NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. A vital consideration in obtaining good results with this technique is the choice of distance function, and correspondingly which features to consider when computing distances between samples. In this paper, a new ensemble technique is proposed to improve the performance of NN classifier. The proposed approach combines multiple NN classifiers, where each classifier uses a different distance function and potentially a different set of features (feature vector). These feature vectors are determined for each distance metric using Simple Voting Scheme incorporated in Tabu Search (TS). The proposed ensemble classifier with different distance metrics and different feature vectors (TS-DF/NN) is evaluated using various benchmark data sets from UCI Machine Learning Repository. Results have indicated a significant increase in the performance when compared with various well-known classifiers. Furthermore, the proposed ensemble method is also compared with ensemble classifier using different distance metrics but with same feature vector (with or without Feature Selection (FS)).

Highlights

The nearest-neighbor (NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems
Simple voting scheme is introduced in the cost function of Tabu Search
Since Diabetes has only 8 features, the proposed algorithm unable to combine the benefits of Feature Selection and Ensemble Classifiers using different distance metrics

Summary

Introduction

The nearest-neighbor (NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. The 1NN classifier is well explored in the literature and has been proved to have good classification performance on a wide range of real-world data sets [1, 2, 3]. Feature selection is useful technique for improving the classification accuracy of NN rule [7, 8]. The term feature selection refers to algorithms that select the best subset of the input feature set. These algorithms are used in the design of pattern classifiers that have three goals [9, 11]: 1. To reduce the cost of extracting features These algorithms are used in the design of pattern classifiers that have three goals [9, 11]: 1. to reduce the cost of extracting features

Methods

Results

Conclusion