Multi-source cross-project software defect prediction based on deep integration

Jing Zhang,Yun He,Wei Wang,Xinfa Li

doi:10.1088/1742-6596/1861/1/012075

Abstract

Cross-project defect prediction (CPDP), training a machine learning model by using training data from other projects, has attracted much attention in recent years. This approach provides a feasible way for small-scale or new developed project with insufficient training data to carry out defect prediction. This paper focus on the bottleneck issue of CPDP, poor accuracy, and propose a deep learning model integration-based approach (MTrADL) for CPDP. This paper consists of two main phases. First, the similarity between target project and source projects is measured by the modified maximum mean discrepancy (MMD) and the top K source projects with high similarity to the target project are selected as training data. Second, for the selected training data, this paper use convolutional neural network (CNN) to build the defect predictor. Each selected training data corresponds to one CNN predictor. Then, multiple predictors are integrated to get the final prediction result. To examine the performance of the proposed approach, this paper conduct experiments on 41 datasets of PROMISE and compare our approach with three state-of-the-art baseline approaches: a training data selection model (TDS), a two-stage transfer learning model (TPTL), and the multi-source transfer learning model (MTrA). The experimental results show that the average F1-score of our approach is 0.76. Across the 41 datasets, on average, MTrADL respectively improves these baseline models by 39.8%, 28.50%, and 10% in terms of F1-score.

Full Text