Abstract

Most software defect prediction models usually assume that enough historical training instances with labels are available. Additionally, the training data and the predicted instances should share the same features to ensure the prediction accuracy. However, in practice, there are many datasets with different granularities containing information in different dimensions. Therefore, it is valuable to effectively use the small scale and different dimensions of data as training instances to improve the prediction performance of the model. We propose a heterogeneous data orienting multiview transfer learning for software defect prediction, denoted as MTDP, which can achieve different dimensions and granularities features to automatically learn labels through neural network models. With this multiview transfer method, lots of training instances are provided for software defect prediction model to ensure the effectiveness of training labels. The proposed MTDP method has four main stages: 1) build heterogeneous transfer models; 2) transfer heterogeneous instances to generate quasi-real instances; 3) label quasi-real instances through co-training and then expand the training set; and (4) construct improved software defect prediction models. The experimental results show that the quasi-real instances have similar effects compared with real instances. Moreover, the software defect prediction performance can be improved by introducing the quasi-real instances into the training dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call