Feature selection is widely used in various fields as a key means of data dimension reduction. The existing feature selection algorithms only use one linear or nonlinear correlation indicator when evaluating variables relationships, which lacks diversity. Considering the complexity of the relationship between features, a novel feature selection evaluation function CONMI is constructed, which ensembles Pearson correlation coefficient (liner) and normalized mutual information (non-linear) to comprehensively portrays the dependencies between features and class variables. We further propose the CONMI_FS algorithm based on CONMI, which selects the optimal subset of features that has high correlation with the class variables and low redundancy between the selected features. CONMI_FS is compared with four methods on 20 datasets and evaluated by reduction rate, classification accuracy, precision and recall metrics on KNN, SVM and DT classifiers. The experimental results show that CONMI_FS obtains the highest reduction rate of 80.04%, and achieves the best classification accuracy on KNN and SVM classifiers, which are 88.83% and 88.98%, respectively. These results indicate that CONMI_FS has good competitiveness.
Read full abstract