Abstract

The class-imbalance problem is an important area that plagues machine learning and data mining researchers. It is ubiquitous in all areas of the real world. At present, many methods have been developed to deal with the class-imbalance problem. For the class-imbalance problem, many researchers believe that the class distribution imbalance is not the main factor affecting the performance of the classification model. When the class distribution imbalance coexists with problems such as class overlap, small disjuncts, and noise, the model performance will be severely affected. For the problem of class-distribution imbalance and class-overlap, the existing methods mainly use nearest neighbors to obtain the local similarity of instances in the local domain, and find the overlapping domains in the data set. To the best of our knowledge, no researchers have considered global similarity. In this regard, to find the global similarity of datasets, a novel Schur decomposition class-overlap undersampling method (SDCU) is proposed. SDCU attempts to obtain potentially overlapping instances on global similarity, and is the first to use matrix decomposition to deal with the problem of class-overlap on class-imbalanced data. We conduct comparative experiments on 46 publicly available real datasets. The experimental results show that when using AUC as the performance evaluation metric, the performance of SDCU has obvious advantages compared with other state-of-the-art methods on three different types of classifiers: SVM, CART, and 3NN. In addition, the test results of Friedman ranking and Holm’s post-hoc test also confirmed the conclusions obtained by the experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.