Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

Seba Susan,Madasu Hanmandlu

doi:10.1007/s11045-018-0612-2

Abstract

A new filter method is proposed for feature selection and ranking that incorporates a novel mutual information with Gaussian gain for evaluating the relationships between features and the class, and in-between features. The new mutual information is derived as per the axioms of classical information theory from the recently introduced non-extensive entropy with Gaussian gain. The characteristic of this entropy is its non-linearity when representing correlated information in natural texture images represented by sparse probability distributions. In this work, we trace this property in our new mutual information in the context of correlated random variables associated with real-world datasets. The non-linearity of the Gaussian function embedded in the mutual information formula is utilized for identifying the most important features in the correct order of rank, right at the outset of the incremental feature selection algorithm. This leads to formation of smaller groups of ranked feature subsets that give the highest classification accuracies. Extensive experimentation on twenty benchmark datasets from the UCI repository along with comparison to the state-of-the-art confirms the efficacy of our approach. An automated optimum feature subset selection is also proposed based on a simple statistical test on the new measure.

Full Text