Cross-project software defect prediction based on domain adaptation learning and optimization

Cong Jin

doi:10.1016/j.eswa.2021.114637

Abstract

Software defect prediction (SDP) is very helpful for optimizing the resource allocation of software testing and improving the quality of software products. The cross-project defect prediction (CPDP) model based on machine learning is first learned through the existing training data with sufficient number and defect labels on one project, and then used to predict the defect labels of another new project with insufficient number and fewer labeled data. However, its prediction performance has a large gap compared with the within-project defect prediction (WPDP) model. The main reason is that there are usually differences between the distributions of training data in different software projects, and it has a greater impact on the prediction performance of the CPDP model. To solve this problem, the kernel twin support vector machines (KTSVMs) is used to implement domain adaptation (DA) to match the distributions of training data for different projects. Moreover, KTSVMs with DA function (called DA-KTSVM) is further used as the CPDP model in this paper. Since the parameters of DA-KTSVM have an impact on its predictive performance, these parameters are optimized by an improved quantum particle swarm optimization algorithm (IQPSO), and the optimized DA-KTSVM is called as DA-KTSVMO. In order to confirm the effectiveness of DA-KTSVMO, some experiments are implemented on 17 open source software projects. Experimental results and analysis show that DA-KTSVMO can not only achieve better prediction performance than other CPDP models compared, but also achieve almost the same or better compared performance than WPDP models when the training sample data is sufficient. In addition, DA-KTSVMO can make better use of existing sufficient data knowledge and realize the reuse of defective data to improve the prediction performance of DA-KTSVMO.

Full Text