Abstract

Software defect prediction is an active research area. Researchers have proposed many approaches to overcome the imbalanced defect problem and build highly effective machine learning models that are not biased towards the majority class. Generative adversarial networks (GAN) are one of the state-of-the-art techniques that can be used to generate synthetic samples of the minority class and produce a balanced dataset. However, it was not investigated thoroughly in the area of imbalanced defect prediction. In this paper, we proposed to combine GAN-based methods with boosting ensembles to yield robust defect prediction models. GAN-based methods were used to balance the defect datasets, and the AdaBoost ensemble was employed to classify the modules into defective and non-defective modules. Our proposed approach was investigated within the context of 10 software defect datasets with different imbalance ratios. Wilcoxon effect size and Scott–Knott effect size difference tests were used as statistical tests to quantify the model’s performance differences statistically. Empirical results indicated that GAN-based methods need hyperparameter optimization when used for imbalanced software defect prediction. In comparison to the traditional sampling techniques, GAN methods outperformed all traditional techniques when used for imbalanced defect prediction. Lastly, results demonstrated that GAN-based methods should not be combined with undersampling to handle imbalance problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call