Abstract

Feature selection methods have been used in various applications of machine learning, bioinformatics, pattern recognition and network traffic analysis. In high dimensional datasets, due to redundant features and curse of dimensionality, a learning method takes significant amount of time and performance of the model decreases. To overcome these problems, we use feature selection technique to select a subset of relevant and non-redundant features. But, most feature selection methods are unstable in nature, i.e., for different training datasets, a feature selection method selects different subsets of features that yields different classification accuracy. In this paper, we provide an ensemble feature selection method using feature–class and feature-feature mutual information to select an optimal subset of features by combining multiple subsets of features. The method is validated using four classifiers viz., decision trees, random forests, KNN and SVM on fourteen UCI, five gene expression and two network datasets.

Highlights

  • Feature selection is used to select a subset of relevant and non-redundant features from a large feature space

  • We found a large number of filter-based feature selection methods such as cfs, gain ratio, info gain and reliefF

  • The proposed EFS-MI feature selection method is implemented in MATLAB 2008 software

Read more

Summary

Introduction

Feature selection is used to select a subset of relevant and non-redundant features from a large feature space. In ensemble-based feature selection method, multiple feature subsets are combined to select an optimal subset of features using combination of feature ranking that improves classification accuracy. People use ensemble-based feature selection method to select a stable feature set which improves classification accuracy. Filter approach [9] is used to select a subset of features from high dimensional datasets without using a learning algorithm. Canedo et al [5] tested an ensemble-based feature selection method where the combiner takes the union of different subsets of features generated by multiple filter methods. We are motivated to improve the classification accuracies of classifiers using an ensemble method that uses feature-class and feature-feature mutual information to combine different subsets of features. Different filters can be ensembled to overcome the biasness or limitations of the identical classifiers and to provide consistent performance over a wide range of applications

Motivation and problem definition
Related work
Result
Experimental results
Result analysis
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call