Abstract

Imbalanced data learning has become a major challenge in data mining and machine learning. Oversampling is an effective way to re-achieve the balance by generating new samples. However, most oversampling methods cannot perform well in the presence of noises and complicated distribution structures, very easy to generate redundant/unsafe/outlier samples. To handle this problem, we endeavor to propose a novel oversampling method, namely Improved and Random Synthetic Minority Oversampling Technique (IR-SMOTE). The core idea of IR-SMOTE is three-fold: (1) by applying an ascending operation to sort the majority class samples, noise samples in each cluster of minority class after k-means clustering are successfully removed; (2) the number of synthetic samples is adaptively assigned to each cluster in minority class by means of the kernel density estimation technique; and (3) based on the obtained attributes of the temporary synthetic samples in terms of random-SMOTE, a new synthesizing method is developed to generate new samples with a guaranteed diversity. Finally, many comparison experiments have been carried out on 18 well-known data sets, which illustrate the effectiveness and universal applicability of the proposed IR-SMOTE method for imbalanced data classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.