WIFLF: An approach independent of the target project for cross‐project defect prediction

Can Cui,Bin Liu,Shihai Wang

doi:10.1002/smr.2497

Abstract

AbstractCross‐project defect prediction (CPDP) is used to build defect prediction models when data from the target project are not enough. There has been several approaches to improve the performance of CPDP, such as feature transformation and instance selection methods. However, existing techniques are strongly dependent on the target data to reduce the distribution discrepancy between source and target projects. That is, the performance of these methods is determined by the effectiveness of feature transformation or the similarity between two projects. Additionally, when there is a large amount of source data that needs to be matched with target data, it will take much time and reduce the efficiency of model construction. Therefore, it is vital to explore a target project‐agnostic approach to build CPDP models. This paper presents a Weighted Isolation Forest with class Label information Filter (WIFLF) to relieve the issues above. Four groups of datasets from AEEEM, Relink and PROMISE Data Repository are used to conduct CPDP models. Besides, WIFLF is compared with 12 approaches. The experimental results indicate that WIFLF significantly outperforms all the baselines. Specifically, WIFLF with random forest significantly improves the performance over the baselines on average by at least 14.64% and 4.90% with respect to Skewed F‐Measure and G‐Measure, respectively.

Full Text