Abstract

For imbalanced data, classification efficiency degrades significantly due to the missing information for the positive class, and existing sampling schemes do not consider the distributions of samples. Additionally, the global parameters of fuzzy neighborhoods are set manually. These defects affect the effectiveness of classifier. To address these problems, we offer an adaptive fuzzy multi-neighborhood feature selection methodology with intercluster distance-based hybrid sampling for class-imbalanced data. First, the number of clusters can be defined in terms of the number of samples in the negative or positive class. The initial centers of the clusters are determined according to the number of clusters, and the dissimilarity and similarity measures are calculated by using the intercluster distances between samples. Then, the cluster center, fuzzy membership matrix, and intercluster distance are studied, and then the optimization objective function is designed. The hybrid sampling scheme can be used to combine the generated positive class samples and negative class samples and obtain a class-balanced system. Second, according to the sample distribution, the standard deviation and a set of adaptive fuzzy multi-neighborhood radii are designed. A fuzzy multi-neighborhood similarity relation is defined by introducing a Gaussian kernel model to obtain a fuzzy multi-neighborhood granule, and an improved fuzzy multi-neighborhood rough set model is provided. Uncertain measures of fuzzy neighborhood systems are evaluated by the positive region and dependency. Third, by integrating fuzzy dependence with fuzzy complementary condition entropy, fuzzy multi-neighborhood complementary mutual information is provided on two viewpoints of algebra and information. Finally, a heuristic feature subset selection methodology for imbalanced classification with hybrid sampling using fuzzy c-means clustering is studied to obtain this excellent set of features. Experiments on 26 imbalanced datasets show the effectiveness of our designed algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call