An Empirical Study on Multi-Source Cross-Project Defect Prediction Models

Xuanying Liu,Jiaqi Zou,Haonan Tong,Zonghao Li

doi:10.1109/apsec57359.2022.00044

Abstract

Multi-source cross-project defect prediction (MSCPDP) refers to transferring defect knowledge from multiple source projects to the target project. MSCPDP has drawn increasing attention of academic and industry communities owing to its advantages compared with single-source cross-project defect prediction (SSCPDP) and some MSCPDP models have been proposed. However, to the best of our knowledge, there are no empirical studies to investigate the effect of different MSCPCP models on the performance of MSCPDP. To comprehensively investigate the performance of different MSCPDP models, we first conduct the literature research about MSCPDP studies, and then identify and compare 7 state-of-the-art MSCPDP models in terms of multiple performance measures including PD, PF, area under ROC curve (AUC), F1, precision, Matthews correlation coefficient (MCC), and Popt20% on 20 publicly available defect datasets. Furthermore, a robust multiple comparison method, i.e., the Scott-Knott effect-size difference (ESD) test, is used for statistical test. The experiment results show that 1) Burak’s Filter always performs best in terms of precision, AUC, MCC, Popt20% except for F1;2) MSCPDP models outperform the mean performance of SSCPDP models on most datasets; 3) the performance of MSCPDP models still needs to be further improved. We suggest software engineers use MSCPDP models but not SSCPDP models for CPDP and pay more attention to both the distribution difference of different datasets and the problems of sample similarity and weight when building MSCPDP models.

Full Text