Software change‐proneness prediction through combination of bagging and resampling methods

Xiaoyan Zhu,Lei Zhu,Xiaolin Jia,Long Cheng,Yueyang He

doi:10.1002/smr.2111

Abstract

AbstractIdentifying the change‐prone parts of software could help managers and developers to effectively allocate maintenance resource and time during early phases of software life cycle. Change‐proneness prediction on file level with binary classification methods makes such identification possible. As the fact that change‐prone files frequently account for a small part of all the files, the prediction performance of standard classification methods is not satisfying. In this paper, we employ imbalanced learning methods, including bagging, resampling, and especially their combination to reduce the performance decrease of standard classifiers caused by the class imbalance problem in change‐proneness prediction. Besides, we propose a boxplot‐based partition method to provide more proper change‐proneness label designation for the training data. Eight open‐source Java projects are chosen in the empirical study to validate the effectiveness of the combination methods in change‐proneness prediction. The experimental results of the empirical study show that combining bagging with resampling can significantly improve the prediction performance of only bagging or resampling. Of all the combination methods employed, combination of bagging with undersampling performs better than others. And support vector machine is more effective as a base classifier than J48 and naive Bayes.

Full Text