Feature Selection by Maximizing Part Mutual Information

Wanfu Gao,Ping Zhang,Liang Hu

doi:10.1145/3297067.3297068

Abstract

Feature selection is an important preprocessing stage in signal processing and machine learning. Feature selection methods choose the most informative feature subset for classification. Mutual information and conditional mutual information are used extensively in feature selection methods. However, mutual information suffers from an overestimation problem, with conditional mutual information suffering from a problem of underestimation. To address the issues of overestimation and underestimation, we introduce a new measure named part mutual information that could accurately quantify direct association among variables. The proposed method selects the maximal value of cumulative summation of the part mutual information between candidate features and class labels when each selected feature is known. To evaluate the classification performance of the proposed method, our method is compared with four state-of the-art feature selection methods on twelve real-world data sets. Extensive studies demonstrate that our method outperforms the four compared methods in terms of average classification accuracy and the highest classification accuracy.

Full Text