Comparing Hyperparameter Optimization in Cross- and Within-Project Defect Prediction: A Case Study

Muhammed Maruf Öztürk

doi:10.1007/s13369-018-3564-9

Abstract

Various studies related to the cross-project defect prediction (CPDP) have been done in defect prediction literature. These studies are based on the methodology which takes training and testing data sets from different projects or varied versions of same project that could have same number of features. Configurable parameters of machine learning algorithms should not be disregarded during defect prediction. In this study, the effects of hyperparameter optimization are investigated in CPDP and within-project defect prediction (WPDP). To this end, this work proposes a novel method that shows how hyperparameter optimization should be performed in CPDP. Thus, two new procedures are proposed by regarding the structure of heterogeneous data sets. Firstly, a defect prediction model is established on 20 data sets. Various hyperparameters are optimized and the success of CPDP and WPDP is compared afterward. According to the obtained results: (i) CPDP is averagely superior to WPDP in hyperparameter optimization; (ii) linear kernel of SVM is better than polynomial and radial kernels in terms of hyperparameter optimization; (iii) max tree depth (interaction.depth) is crucial in increasing accuracy if a tree-based algorithm is used.

Full Text