Abstract
In many real-world machine learning applications, including software defect prediction, detecting fraud, detection of network intrusion and penetration, managing risk, and medical dataset, class imbalance is an inherent issue. It happens when there aren't many instances of a certain class mostly the class the procedure is meant to identify because the occurrence the class reflects is rare. The considerable priority placed on correctly classifying the relatively minority instances—which incur a higher cost if incorrectly categorized than the majority instances—is a major driving force for class imbalance learning. Supervised models are often designed to maximize the overall classification accuracy; however, because minority examples are rare in the training data, they typically misclassify minority instances. Training a model is facilitated by balancing the dataset since it keeps the model from becoming biased in favor of one class. Put another way, just because the model has more data, it won't automatically favor the majority class. One method of reducing the issue of class imbalance before training classification models is data sampling; however, the majority of the methods now in use introduce additional issues during the sampling process and frequently overlook other concerns related to the quality of the data. Therefore, the goal of this work is to create an effective sampling algorithm that, by employing a straightforward logical framework, enhances the performance of classification algorithms. By providing a thorough literature on class imbalance while developing and putting into practice a novel Cluster Under Sampling Technique (CUST), this research advances both academia and industry. It has been demonstrated that CUST greatly enhances the performance of popular classification techniques like C 4.5 decision tree and One Rule when learning from imbalance datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Applied Science, Information and Computing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.