Abstract

Context: Previous studies have shown that a transfer learning model, TCA+ proposed by Nam et al., can significantly improve the performance of cross-project defect prediction (CPDP). TCA+ achieves the improvement by reducing data distribution difference between source (training data) and target (testing data) projects. However, TCA+ is unstable, i.e., its performance varies largely when using different source projects to build prediction models. In practice, it is hard to choose a suitable source project to build the prediction model.Objective: To address the limitation of TCA+, we propose a two-phase transfer learning model (TPTL) for CPDP.Method: In the first phase, we propose a source project estimator (SPE) to automatically choose two source projects with the highest distribution similarity to a target project from candidates. Next, two source projects that are estimated to achieve the highest values of F1-score and cost-effectiveness are selected. In the second phase, we leverage TCA+ to build two prediction models based on the two selected projects and combine their prediction results to further improve the prediction performance.Results: We evaluate TPTL on 42 defect datasets from PROMISE repository, and compare it with two versions of TCA+ (TCA+_Rnd, randomly selecting one source project; TCA+_All, using all alternative source projects), a related source project selection model TDS proposed by Herbold, a state-of-the-art CPDP model leveraging a log transformation (LT) method, and a transfer learning model Dycom with better form of TCA. Experiment results show that, on average across 42 datasets, TPTL respectively improves these baseline models by 19%, 5%, 36%, 27%, and 11% in terms of F1-score; by 64%, 92%, 71%, 11%, and 66% in terms of cost-effectiveness.Conclusion: The proposed TPTL model can solve the instability problem of TCA+, showing substantial improvements over the state-of-the-art and related CPDP models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call