Abstract

In different fields, such as machine learning and data mining, class imbalances have been one of the most complex issues for the past few decades. The unique condition of an imbalanced dataset that distributes each class of a particular dataset unevenly. The positive class is slightly smaller than the negative one. Many standard classification algorithms in this case do not classify instances related to the positive class. Typically the main goal of the classification task is a positive class. To deal with this problem, several approaches were proposed, for example sampling dependent over-sampling, undersampling, classification level enhancements, or the combination of two or more classifiers. The major problem however is that most solutions have a negative class, a computational cost, a storage problem, or a long training period. Data upsampling or downsampling may resolve a possible solution to the issue of skewness of data. In this paper, a hybrid technique is presented, followed by a random forest algorithm (SMO-RF), to categorized binary imbalanced data using the Technique of Synthetic Minority Oversampling. We have tested our model with four standard imbalanced datasets and obtained a higher F-measure, G-mean as well as ROC values for all data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.