Abstract

A dataset exhibits class imbalance problem when one class has very few examples compared to the other class also referred to as between class imbalance. Apart from between-class imbalance, imbalance within classes where classes are composed of different number of sub-clusters with these sub-clusters containing different number of examples may also affect the performance of the classifier. In this paper, we propose a method that can handle both between-class and within-class imbalance simultaneously that also takes into consideration various data intrinsic characteristics. The proposed method uses model-based clustering with respect to classes to identify the sub-clusters present in the dataset and oversamples examples in each sub-cluster in such a manner that it eliminates between class and within class imbalance simultaneously. We validate our approach using neural network on ten publicly available datasets. The experimental results show the proposed method to be statistically significantly superior to other methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.