Abstract

Cross-project defect prediction (CPDP) refers to predicting defects in the target project lacking of defect data by using prediction models trained on the historical defect data of other projects (i.e., source data). However, CPDP requires the source and target projects have common metric set (CPDP-CM). Recently, heterogeneous defect prediction (HDP) has drawn the increasing attention, which predicts defects across projects having heterogeneous metric sets. However, building high-performance HDP methods remains a challenge owing to several serious challenges including class imbalance problem, nonlinear, and the distribution differences between source and target datasets. In this paper, we propose a novel kernel spectral embedding transfer ensemble (KSETE) approach for HDP. KSETE first addresses the class-imbalance problem of the source data and then tries to find the latent common feature space for the source and target datasets by combining kernel spectral embedding, transfer learning, and ensemble learning. Experiments are performed on 22 public projects in both HDP and CPDP-CM scenarios in terms of multiple well-known performance measures such as, AUC , G-Measure , and MCC . The experimental results show that (1) KSETE improves the performance over previous HDP methods by at least 22.7, 138.9, and 494.4 percent in terms of AUC , G-Measure , and MCC , respectively. (2) KSETE improves the performance over previous CPDP-CM methods by at least 4.5, 30.2, and 17.9 percent in AUC , G-Measure , and MCC , respectively. It can be concluded that the proposed KSETE is very effective in both the HDP scenario and the CPDP-CM scenario.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.