An Empirical Study on Combining Source Selection and Transfer Learning for Cross-Project Defect Prediction

Wanzhi Wen,Bin Zhang,Xiaolin Ju,Xiang Gu

doi:10.1109/ibf.2019.8665492

Abstract

Software defect prediction (SDP) can help software developers and quality assurance personnel to effectively predict software fault proneness. Recently, researchers have proposed a lot of methods to improve the predicting results, especially under a within-project defect prediction (WPDP) setting. However, cross-project defect prediction (CPDP) is difficult because of the data distribution difference between source and target projects. Transfer learning model has been proven that it can effectively reduce the data distribution difference. By the intuition, if the better source is selected, we can get better prediction performance based on transfer learning model. In this paper, we conducted an empirical study on source selection including feature selection and source project selection for CPDP, and then combined source selection with popular transfer learning model TCA+ in CPDP. Finally, the result shows that the combining technique MZTCA+ can effectively improve the state-of-the art CPDP models, such as TCA+, LT, Dycom, TDS.

Full Text