Towards building a pragmatic cross-project defect prediction model combining non-effort based and effort-based performance measures for a balanced evaluation

Yogita Khatri,Sandeep Kumar Singh

doi:10.1016/j.infsof.2022.106980

Abstract

• We propose a two-phase transfer boosting-based cross-project prediction model. • The proposed model is assessed using non-effort based and effort-based measures. • Training data weights are based on feature importance and inter-project similarity. • We validate the effectiveness of the proposed model on a large corpus of data. Recent years have witnessed the growing trend in cross-project defect prediction (CPDP), where the training and the testing data come from different projects having different data distributions. Several CPDP methods have been presented in the literature to overcome differences in their distributions, but the majority of the existing approaches have been evaluated considering the availability of unlimited inspection effort, which is practically impossible, thus leading to fallacious conclusions. Further, they focused more on improving Recall over Precision leading to a high probability of false alarm (PF), causing significant wastage of developer's efforts and time. Addressing these issues, we propose a Two-Phase Transfer Boosting (TPTB) model, which aims at improving the performance not only in terms of non-effort based measures (NEBMs) (making a balance between Recall and PF) but also in terms of effort based measures (EBMs), considering the availability of limited inspection effort. To mitigate the distribution differences, the first phase assigns initial weights to the training modules based on the feature distribution and feature importance. The second phase applies the Dynamic Transfer AdaBoost algorithm to build an ensemble classifier to lessen the impact of contradictory training modules. In addition, a sorting strategy is designed to prioritize the modules for further inspection. Statistical results on 62 datasets revealed a better-balanced performance of our TPTB model holistically over NN-filter, ManualDown, EASC, and Cruz model with performance comparable to WPDP (Within-project defect prediction) considering NEBMs. Besides, when considering EBMs together, TPTB showed statistically and practically more balanced performance as compared to ManualUP and Cruz with overall performance comparable to EASC. Our results demonstrate the efficacy of the TPTB model in a practical setting empowering the quality assurance team to predict and prioritize the defective modules allocating limited inspection effort by optimally focusing on highly defective modules.

Full Text