An Empirical Study on Heterogeneous Defect Prediction Approaches

Haowen Chen,Xiao-Yuan Jing,Di Wu,Yi Peng,Zhiqiang Li,Zhiguo Huang

doi:10.1109/tse.2020.2968520

Abstract

Software defect prediction has always been a hot research topic in the field of software engineering owing to its capability of allocating limited resources reasonably. Compared with cross-project defect prediction (CPDP), heterogeneous defect prediction (HDP) further relaxes the limitation of defect data used for prediction, permitting different metric sets to be contained in the source and target projects. However, there is still a lack of a holistic understanding of existing HDP studies due to different evaluation strategies and experimental settings. In this paper, we provide an empirical study on HDP approaches. We review the research status systematically and compare the HDP approaches proposed from 2014 to June 2018. Furthermore, we also investigate the feasibility of HDP approaches in CPDP. Through extensive experiments on 30 projects from five datasets, we have the following findings: (1) metric transformation-based HDP approaches usually result in better prediction effects, while metric selection-based approaches have better interpretability. Overall, the HDP approach proposed by Li <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.</i> (CTKCCA) currently has the best performance. (2) Handling class imbalance problems can boost the prediction effects, but the improvements are usually limited. In addition, utilizing mixed project data cannot improve the performance of HDP approaches consistently since the label information in the target project is not used effectively. (3) HDP approaches are feasible for cross-project defect prediction in which the source and target projects have the same metric set.

Full Text