Predicting defects in object-oriented software using cost-sensitive classification

R Malhotra,J Jain

doi:10.1088/1757-899x/1022/1/012112

Abstract

In this software era, it is vital to produce reliable and good quality software. Early detection of defects aids in building accurate software with reduced cost and other resources. Researchers have a keen interest in producing machine learning models for effective and accurate software defect prediction in the early stages of software development. Object-oriented metrics of the software are used in developing these models. These models may result in biased predictions owing to the class imbalance problem existing in most of the software datasets. This paper provides an effective defect prediction framework for imbalanced data by employing cost-sensitive classifiers and stable performance measures like GMean, Balance, and AUC. Four decision tree-based classifiers with different cost ratios are investigated to predict defects in three Apache projects. The empirical results are statistically validated using the nonparametric Friedman test and Wilcoxon signed-rank test. The results state with 99% confidence that the predictive capability of J48, AdaBoostM1, Bagging, and RandomSubSpace improved after employing cost-sensitive learning for the four classifiers used in this study.

Full Text