Abstract

Imbalanced and overlapping class distributions present several challenges, including poor generalization, misleading accuracy, and inflated importance of the majority class, which further complicate the classification task. To tackle this, we introduce a new novel oversampling method called GOS that generates samples from positive overlapping samples for imbalanced and overlapping data which improves the classification performance. Firstly, In GOS, a novel concept termed overlapping degree is introduced utilizing both local and global information from positive and negative samples. Secondly, it measures how much a positive sample contributes to the overlapping region and helps to identify positively overlapping samples. Lastly, the identified positive overlapping samples are transformed to generate new positive samples with a transformation matrix derived from the distribution information of all positive samples. We compare GOS with 14 commonly used under-sampling, oversampling, and advanced oversampling methods on 15 publicly available real imbalanced datasets with sample sizes varying from 178 to 2000 having an imbalance ratio varying from 2.02 to 41.4. The experimental results show that GOS outperforms these baselines achieving average improvements of 3.2 % in accuracy, 2.5 % in G-mean, 4.5 % in F1-score, and 5.2 % in AUC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.