Abstract

Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.

Highlights

  • In supervised classification tasks, the aim is to construct a predictor that assigns a class label to new observations

  • ensemble of subset of kNN classifiers (ESkNN) gives overall better results on 8 data sets, on 9 data sets random forest is better than all the methods, on 5 data sets support vector machines (SVM) is giving minimum classification error and on one data sets random kNN (RkNN) outperforms the rest of the methods

  • In case of non-informative features in the data, Table 6, on 11 data sets ESkNN gives minimum classification error than the other methods, on 9 data set RF is giving best classification performance and on one data set SVM is giving better results and on two data sets their is no clear winner between random forest and ESkNN, ESkNN gives better performance than k nearest neighbours (kNN) based methods and SVM

Read more

Summary

Introduction

The aim is to construct a predictor that assigns a class label to new observations. In many real life classification problems, one often encounters with imprecise data including non-informative features which dramatically increases the classification error of the algorithms (Nettleton et al 2010). To overcome this problem feature selection methods are usually recommended before classification to mitigate the effect of such non-informative features (Liu et al 2014; Mahmoud et al 2014). These methods investigate the most discriminative features subset from the original features that increases classification performance of a classifier. This encourages combining the results of several best feature subsets

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call