Abstract

The nearest-neighbor (NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. A vital consideration in obtaining good results with this technique is the choice of distance function, and correspondingly which features to consider when computing distances between samples. In this paper, a new ensemble technique is proposed to improve the performance of NN classifier. The proposed approach combines multiple NN classifiers, where each classifier uses a different distance function and potentially a different set of features (feature vector). These feature vectors are determined for each distance metric using Simple Voting Scheme incorporated in Tabu Search (TS). The proposed ensemble classifier with different distance metrics and different feature vectors (TS-DF/NN) is evaluated using various benchmark data sets from UCI Machine Learning Repository. Results have indicated a significant increase in the performance when compared with various well-known classifiers. Furthermore, the proposed ensemble method is also compared with ensemble classifier using different distance metrics but with same feature vector (with or without Feature Selection (FS)).

Highlights

  • The nearest-neighbor (NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems

  • Simple voting scheme is introduced in the cost function of Tabu Search

  • Since Diabetes has only 8 features, the proposed algorithm unable to combine the benefits of Feature Selection and Ensemble Classifiers using different distance metrics

Read more

Summary

Introduction

The nearest-neighbor (NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. The 1NN classifier is well explored in the literature and has been proved to have good classification performance on a wide range of real-world data sets [1, 2, 3]. Feature selection is useful technique for improving the classification accuracy of NN rule [7, 8]. The term feature selection refers to algorithms that select the best subset of the input feature set. These algorithms are used in the design of pattern classifiers that have three goals [9, 11]: 1. To reduce the cost of extracting features These algorithms are used in the design of pattern classifiers that have three goals [9, 11]: 1. to reduce the cost of extracting features

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call