Recently developed imbalanced data classification models are mainly focused on the majority class samples. In addition, several whale optimization-based feature reduction models are inefficient for high-dimensional data, readily fall into the local optimum, and are subject to difficulties associated with achieving a global optimal feature subset, due to high costs. To overcome the drawbacks, in this study, a two-stage feature reduction model using fuzzy neighborhood rough sets (FNRS) and the binary whale optimization algorithm (BWOA) was developed for imbalanced data classification. First, to indicate the sample fuzziness of mixed data, a similarity measure between samples based on fuzzy neighborhood was defined to investigate the similarity matrix and fuzzy neighborhood granule, and a new FNRS model was presented by constructing lower and upper approximations. By considering the uneven distribution of classes, the boundary-based feature significance measure was developed to minimize the influence of the uncertainty in boundary region for imbalanced data. Second, fuzzy neighborhood roughness and decision entropy were investigated based on FNRS, and by integrating these above measures, fuzzy neighborhood decision entropy was proposed to evaluate the fuzziness and roughness of the fuzzy neighborhood for imbalanced data. The external and internal significance metrics were proposed to achieve the preselected feature subset in the first stage. Third, in this second stage, a new control factor was defined to control the position of whales, and a novel fitness function was developed to evaluate the selected feature subset from imbalanced datasets. Thereafter, the immune regulation strategy of artificial immune was introduced into the BWOA to design the mixed selection probability, to divide the whale population. Two local interference strategies were applied to adjust the whale position and prevent BWOA trapped in the local optimum. Thus, an optimal feature subset was achieved by constantly iterating the BWOA. Finally, a two-stage feature reduction algorithm was designed to handle imbalanced and high-dimensional data, where the particle swarm optimization (PSO) algorithm was employed to determine the different optimized parameters for this two-stage algorithm. Experiments conducted on 22 datasets revealed that the proposed algorithm is efficient for two-class and multiclass datasets.
Read full abstract