Evaluating Data Filter on Cross-Project Defect Prediction: Comparison and Improvements

Yong Li,Zhiqiu Huang,Yong Wang,Bingwu Fang

doi:10.1109/access.2017.2771460

Abstract

Cross-project defect prediction (CPDP) is a field of study where a software project lacking enough local data can use data from other projects to build defect predictors. To support CPDP, the cross-project data must be carefully filtered before being applied locally. Researchers have devised and implemented a plethora of various data filters for the improvement of CPDP performance. However, it is still unclear what data filter strategy is most effective, both generally and specifically, in CPDP. The objective of this paper is to provide an extensive comparison of well-known data filters and a novel filter devised in this paper. We perform experiments on 44 releases of 14 open-source projects, and use Naive Bayes and a support vector machine as the underlying classifier. The results demonstrate that the data filter strategy improves the performance of cross-project defect prediction significantly, and the hierarchical select-based filter proposed performs significantly better. Moreover, when using appropriate data filter strategy, the defect predictor built from cross-project data can outperform the predictor learned by using within-project data.

Full Text