Abstract

The class imbalance problem has negative effects on the performance of feature selection in imbalanced data. Traditional feature selection algorithms always study on the balanced class distribution of the data and improve the overall classification accuracy for the optimization goal, which tends to be overwhelmed by the large classes, ignoring the small ones. This paper proposes a novel feature selection method based on the weighted mutual information (WMI) for the imbalanced data, defined as WMI algorithm. The WMI algorithm assigns different weights to the samples based on the fuzzy c-means (FCM) clustering algorithm and then calculates the mutual information based on the weight of each sample. This paper used the AUC as the evaluation criterion of the selected feature. At last, four unbalanced datasets from NASA software defect datasets are used to validate the proposed approach. Experimental results show that the proposed method achieves higher prediction accuracy of both minority class and majority class.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.