Abstract

This paper addresses the high-dimensional classification problem, which is very important in machine learning. When the number of features of the data is very high, the classification performance of a given classifier can degrade because there are not enough samples for training. One of the solutions to cope with this problem is to perform feature selection to reduce the number of features. We propose a new hybrid feature selection algorithm based on interaction information that improves upon the previous one. Our improved method employs interaction information to select candidate features to be added to the current feature subset. Cohen’s $d$ is used as the significance testing to decide whether a new feature is permanently added to the subset. We adopt new stopping criteria to allow intensive search. Our search method is efficient and is able to find excellent solutions. Experiments results on eleven high-dimensional data sets show that compared to other hybrid feature selection algorithms, our proposed algorithm provides high classification accuracy and requires a small number of features for classification.

Highlights

  • One of the solutions to cope with the curse of dimensionality in pattern recognition is to employ feature selection

  • For the artificial madelon data set with five informative features, the information-guided incremental selection (IGIS)+ algorithm selects an average of ten features and gives the highest test set accuracy rate of 79.32% among the five algorithms

  • The inverse sequential floating search method (iSFSM) method is better than the IGIS algorithm and ranks second, while Smart-BT and IWSSr,s rank fourth and fifth, respectively

Read more

Summary

INTRODUCTION

One of the solutions to cope with the curse of dimensionality in pattern recognition is to employ feature selection. Wrapper methods [6], [7], on the other hand, employ the performance of a specific classifier to assess the usefulness of a selected subset during the search They tend to yield higher performance results than filter methods do for the same number of selected features, wrapper methods have the disadvantage of being very time-consuming for highdimensional data. The IGIS algorithm is a hybrid feature selection algorithm that employs interaction information to rank candidate features to be added to the selected subset. Experimental results on different types of high-dimensional data sets show that our proposed algorithm consistently outperforms prior hybrid feature selection algorithms on many data sets in terms of classification accuracy and the number of selected features.

INFORMATION THEORY
RESULTS AND DISCUSSION
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call