Abstract

Nowadays, machine learning techniques are employed in a wide range of applications, where classification is a common task in machine learning. It predicts the class label of a previously unseen example according to the decision of a classification model, which is learned by running a classifier learning algorithm on the collected training examples set. On the other hand, in many practical applications, the collected training sets are usually class imbalanced, that is, one class can have significantly more examples than the other class(es), but the minority class usually carries much valuable information and is more important than the majority class. However, most classifier learning algorithms are designed under the assumption that each class in a training set has approximately the same number of examples, leading to the consequence that they often can not achieve satisfactory classification performance on imbalanced data especially for the minority class examples. To solve this problem, a Radial-Based Undersampling approach with Adaptive undersampling Ratio (RBU-AR) is proposed in this paper. The main novelty of RBU-AR is that it attempts to determine the proper undersampling ratio according to the class overlap data complexity rather than adopting the default value 1 or using the empirical trial and error strategy as many existing undersampling approaches do. Experiments are conducted on 30 benchmark imbalanced datasets and 10 artificial datasets, the obtained results and corresponding statistical tests indicate that class overlap degree indeed has a great influence on the achievable classification performance and is usually more important than the class imbalance ratio IR, and our undersampling approach RBU-AR generally achieves highly competitive or better performance with respect to several state-of-the-art approaches. Therefore, this work provides a theoretical guideline in determining the proper extent of undersampling by utilizing the class overlap data complexity information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.