Cross project defect prediction for open source software

Anushree Agrawal,Ruchika Malhotra

doi:10.1007/s41870-019-00299-6

Abstract

Software defect prediction is the process of identification of defects early in the life cycle so as to optimize the testing resources and reduce maintenance efforts. Defect prediction works well if sufficient amount of data is available to train the prediction model. However, not always this is the case. For example, when the software is the first release or the company has not maintained significant data. In such cases, cross project defect prediction may identify the defective classes. In this work, we have studied the feasibility of cross project defect prediction and empirically validated the same. We conducted our experiments on 12 open source datasets. The prediction model is built using 12 software metrics. After studying the various train test combinations, we found that cross project defect prediction was feasible in 35 out of 132 cases. The success of prediction is determined via precision, recall and AUC of the prediction model. We have also analyzed 14 descriptive characteristics to construct the decision tree. The decision tree learnt from this data has 15 rules which describe the feasibility of successful cross project defect prediction.

Full Text