Abstract

Machine learning classifiers perform well on balanced datasets. Unfortunately, a lot of the real-world data sets are naturally imbalanced. So, imbalanced classification is a serious problem in machine learning. The imbalanced class distribution misleads classifiers from correctly classifying the minor class. This paper introduces Reduced Noise-SMOTE (RN-SMOTE) for pre-processing imbalanced data. RN-SMOTE firstly, oversamples the training data using SMOTE which introduces noisy oversampled synthetic instances in the minority class. Then, applying DBSCAN to detect and remove noise. Next, the clean artificial instances are combined with the original data. Finally, RN-SMOTE applies SMOTE again to rebalance the dataset before introducing it to the underlying classifier. RN-SMOTE is evaluated using 9 different classifiers and 9 different imbalanced datasets with different imbalance ratios and five of them are used for outlier detection. The results proved that the performance of the classifiers has been improved with RN-SMOTE and outperformed the performance with original data and SMOTE with percentage based on the classifier, dataset and evaluation metric. Also, performance of RN-SMOTE has been compared to the performance of the current state of art and resulted in an increase up to 37.41%, 23.28%, 13.95% and 9.07% in terms of Recall, F1, Precision and Accuracy for RN-SMOTE.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.