Abstract

Since minority samples are substantially less common than majority samples, many industrial applications, such as credit card fraud detection (CCFD) and defective part identification, call for imbalanced classification. The performance of a classifier tends to suffer from the noisy samples in majority or minority classes. This work proposes a new undersampling scheme, called a clustering-based noisy-sample-removed undersampling scheme (NUS) for imbalanced classification. The majority class samples are first clustered. The distance of the majority class sample from the cluster center that is furthest away is used as the radius to build a hypersphere, with each cluster’s center assumed to be a spherical center. We determine the Euclidean distance between the center of a cluster and each minority sample to find whether they are in the hypersphere or not. Afterward, we exclude noisy samples from the minority class. The noisy samples of majority classes are removed by using the same procedure. Second, we propose an NUS, which combines noisy sample removal with undersampling techniques. Finally, to prove the effectiveness of NUS, we integrate NUS with the basic classifiers random forest (RF), decision tree (DT), and logistics regression (LR). We conduct their comparison with seven undersampling, oversampling, and noisy-sample-removed methods. This work performs experiments on 13 public and three real transaction datasets related to e-commerce. The results show that NUS plays a positive role in promoting existing classifiers’ performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.