An Empirical Study of Robustness and Stability of Machine Learning Classifiers in Software Defect Prediction

Arvinder Kaur,Kamaldeep Kaur

doi:10.1007/978-3-319-11218-3_35

Abstract

Software is one of the key drivers of twenty first century business and society. Delivering high quality software systems is a challenging task for software developers. Early software defect prediction, based on software code metrics, has been intensely researched by the software engineering research community. Recent knowledge advancements in machine learning have been intensely explored for development of highly accurate automatic software defect prediction models. This study contributes to the application of machine learning in software defect prediction by investigating the robustness and stability of 17 classifiers on 44 open source software defect prediction data sets obtained from PROMISE repository. The Area under curve (AUC) of Receiver Operating Characteristic Curve (ROC) for each of the 17 classifiers is obtained for 44 defect prediction data sets. Our experiments show that Random Forests, Logistic Regression and Kstar are robust as well as stable classifiers for software defect prediction applications. Further, we demonstrate that Naive Bayes and Bayes Networks, which have been shown to be robust and comprehensible classifiers in previous on software defect prediction, have poor stability in open source software defect prediction.

Full Text