Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning

Xiaoyuan Jing,Baowen Xu,Xiwei Dong,Fumin Qi,Fei Wu

doi:10.1145/2786805.2786813

Abstract

Cross-company defect prediction (CCDP) learns a prediction model by using training data from one or multiple projects of a source company and then applies the model to the target company data. Existing CCDP methods are based on the assumption that the data of source and target companies should have the same software metrics. However, for CCDP, the source and target company data is usually heterogeneous, namely the metrics used and the size of metric set are different in the data of two companies. We call CCDP in this scenario as heterogeneous CCDP (HCCDP) task. In this paper, we aim to provide an effective solution for HCCDP. We propose a unified metric representation (UMR) for the data of source and target companies. The UMR consists of three types of metrics, i.e., the common metrics of the source and target companies, source-company specific metrics and target-company specific metrics. To construct UMR for source company data, the target-company specific metrics are set as zeros, while for UMR of the target company data, the source-company specific metrics are set as zeros. Based on the unified metric representation, we for the first time introduce canonical correlation analysis (CCA), an effective transfer learning method, into CCDP to make the data distributions of source and target companies similar. Experiments on 14 public heterogeneous datasets from four companies indicate that: 1) for HCCDP with partially different metrics, our approach significantly outperforms state-of-the-art CCDP methods; 2) for HCCDP with totally different metrics, our approach obtains comparable prediction performances in contrast with within-project prediction results. The proposed approach is effective for HCCDP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning
Xiao Yu ... Yiheng Jian
Soft Computing | VOL. 22
Xiao Yu, et. al.Xiao Yu ... Yiheng Jian
08 Mar 2018
Soft Computing | VOL. 22

A Multi-Source TrAdaBoost Approach for Cross-Company Defect Prediction
Xiao Yu ... Guoping Nie
-
Xiao Yu, et. al.Xiao Yu ... Guoping Nie
01 Jul 2016
01 Jul 2016

Semi-supervised Heterogeneous Defect Prediction with Open-source Projects on GitHub
Ying Sun ... Fei Wu
International Journal of Software Engineering and Knowledge Engineering | VOL. 31
Ying Sun, et. al.Ying Sun ... Fei Wu
01 Jun 2021
International Journal of Software Engineering and Knowledge Engineering | VOL. 31

Improving Cross-Company Defect Prediction with Data Filtering
Xiao Yu ... Weiqiang Peng
International Journal of Software Engineering and Knowledge Engineering | VOL. 27
Xiao Yu, et. al.Xiao Yu ... Weiqiang Peng
01 Nov 2017
International Journal of Software Engineering and Knowledge Engineering | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning

Abstract

Talk to us

Similar Papers