Abstract

A characteristic feature of real-world applications is the occurrence of dataset class imbalance in the output class distribution common in practical business applications such as spam filtering and fraud detection. Predictive modeling contributions from the minority or underrepresented class are overlooked by most learning algorithms. Addressing this challenge includes applying re-sampling techniques that eliminate class distribution imbalance for a more balanced output class distribution in the training examples. Random sampling techniques such as random over-sampling of the minority class duplicates the minority class examples to achieve a more balanced distribution or random under-sampling to delete training examples in the majority class for a balanced distribution to eliminate class imbalance in the dataset and finally Synthetic minority over-sampling technique which is designed to generate synthetic examples for the minority class to address class imbalance. The usefulness of these random sampling techniques has received attention in several research studies, particularly for binary classifications in two-class or multi-classification problems. This application to many is aimed at achieving equal class distribution meant to determine optimal model performance. This comparative assessment of random sampling optimization show significant difference in model performance on applied random sampling technique use and also confirms that even with poor performance in false alarms, high prediction accuracy score and roc_auc score can be achieved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call