Abstract

AbstractCross‐project defect prediction (CPDP) is used to build defect prediction models when data from the target project are not enough. There has been several approaches to improve the performance of CPDP, such as feature transformation and instance selection methods. However, existing techniques are strongly dependent on the target data to reduce the distribution discrepancy between source and target projects. That is, the performance of these methods is determined by the effectiveness of feature transformation or the similarity between two projects. Additionally, when there is a large amount of source data that needs to be matched with target data, it will take much time and reduce the efficiency of model construction. Therefore, it is vital to explore a target project‐agnostic approach to build CPDP models. This paper presents a Weighted Isolation Forest with class Label information Filter (WIFLF) to relieve the issues above. Four groups of datasets from AEEEM, Relink and PROMISE Data Repository are used to conduct CPDP models. Besides, WIFLF is compared with 12 approaches. The experimental results indicate that WIFLF significantly outperforms all the baselines. Specifically, WIFLF with random forest significantly improves the performance over the baselines on average by at least 14.64% and 4.90% with respect to Skewed F‐Measure and G‐Measure, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call