Abstract

Heterogeneous defect prediction (HDP) is a significant research topic in cross-project defect prediction (CPDP), due to the inconsistency of metrics used between source and target projects. While most HDP methods aim to improve the performance of models trained on data from one source project, few studies have investigated how the number of source projects affects predictive performance. In this paper, we propose a new multi-source heterogeneous kernel mapping (MSHKM) algorithm to analyze the effects of different numbers of source projects on prediction results. First, we introduce two strategies based on MSHKM for multi-source HDP. To determine the impact of the number of source projects on the predictive performance of the model, we regularly vary the number of source projects in each strategy. Then, we compare the proposed MSHKM with state-of-the-art HDP methods and within-project defect prediction (WPDP) methods, in terms of three common performance measures, using 28 data sets from five widely used projects. Our results demonstrate that, (1) in the multi-source HDP scenario, strategy 2 outperforms strategy 1; (2) for MSHKM, a lower number of source projects leads to better results and performance under strategy 1, while n = 4 is the optimal number under strategy 2; (3) MSHKM performs better than related state-of-the-art HDP methods; and (4) MSHKM outperforms WPDP. In summary, our proposed MSHKM algorithm provides a promising solution for heterogeneous cross-project defect prediction, and our findings suggest that the number of source projects should be carefully selected to achieve optimal predictive performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call