Abstract

For imbalanced data, classification efficiency degrades significantly due to the missing information for the positive class, and existing sampling schemes do not consider the distributions of samples. Additionally, the global parameters of fuzzy neighborhoods are set manually. These defects affect the effectiveness of classifier. To address these problems, we offer an adaptive fuzzy multi-neighborhood feature selection methodology with intercluster distance-based hybrid sampling for class-imbalanced data. First, the number of clusters can be defined in terms of the number of samples in the negative or positive class. The initial centers of the clusters are determined according to the number of clusters, and the dissimilarity and similarity measures are calculated by using the intercluster distances between samples. Then, the cluster center, fuzzy membership matrix, and intercluster distance are studied, and then the optimization objective function is designed. The hybrid sampling scheme can be used to combine the generated positive class samples and negative class samples and obtain a class-balanced system. Second, according to the sample distribution, the standard deviation and a set of adaptive fuzzy multi-neighborhood radii are designed. A fuzzy multi-neighborhood similarity relation is defined by introducing a Gaussian kernel model to obtain a fuzzy multi-neighborhood granule, and an improved fuzzy multi-neighborhood rough set model is provided. Uncertain measures of fuzzy neighborhood systems are evaluated by the positive region and dependency. Third, by integrating fuzzy dependence with fuzzy complementary condition entropy, fuzzy multi-neighborhood complementary mutual information is provided on two viewpoints of algebra and information. Finally, a heuristic feature subset selection methodology for imbalanced classification with hybrid sampling using fuzzy c-means clustering is studied to obtain this excellent set of features. Experiments on 26 imbalanced datasets show the effectiveness of our designed algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.