Abstract

Obtaining informative features is crucial in imbalanced classification. However, existing neighborhood rough set-based feature selection approaches easily overlook the diversity and complexity of data distributions, and it is difficult to obtain this global optimal feature subset from imbalanced and high-dimensional datasets. To tackle these limitations, we construct a new two-stage feature subset selection scheme by fusing the fuzzy multi-neighborhood rough set (FMRS) with the binary whale optimization algorithm (BWOA) for imbalanced data. First, to evaluate those distributions of different features, this standard deviation coefficient is introduced to construct a fuzzy multi-neighborhood radius set. Then, the fuzzy multi-neighborhood granule and fuzzy membership degree are presented to establish the novel FMRS, and the feature significance measure in the view of algebra is developed to balance the approximate properties and influences of different features in the negative and positive classes. Second, fuzzy multi-neighborhood conditional entropy is defined to maximize information quantity of class- imbalanced data from the information view, and then by fusing the two evaluation perspectives above, this mixed metric is provided to fully assess this uncertainty of class-imbalanced datasets. These internal and external significant metrics are designed to obtain this preselected candidate set of features based on the filter FMRS model at this first stage. Third, a control factor can be developed to dominate the whale position update, and a novel fitness function will be constructed when fusing the dependency degree and entropy measure with the reduction ratio to evaluate this optimal subset of features. Adopting population partitioning and local interference schemes can prevent the BWOA from becoming trapped within a local optimum. To reduce this search space of evolution, the dynamic bitmask is used to improve the BWOA, and then an optimal subset of features is acquired through continuous iterations of this wrapper BWOA at this second stage. Finally, a new two-stage algorithm for feature subset selection by fusing FMRS and BWOA is provided to process class-imbalanced data, where this particle swarm optimization algorithm confirms those optimized parameters. Experiments on 31 datasets show that our algorithm is efficient and can achieve excellent classification efficiency for binary and multiclass imbalanced data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.