Performance Evaluation of Convolutional Neural Network for Multi-Class in Cross Project Defect Prediction

Sundas Noreen,Rizwan Bin Faiz,Mohamed Maddeh,Sultan Alyahya

doi:10.3390/app122312269

Sundas Noreen, Rizwan Bin Faiz + Show 2 more

Open Access

https://doi.org/10.3390/app122312269

Copy DOI

Abstract

Cross-project defect prediction (CPDP) is a practical approach for finding software defects in projects which have incomplete or fewer data. Improvements to the defect prediction accuracy of CPDP—such as the PROMISE repository, the correct classification of the source data, removing the noise, reducing the distribution gap, and balancing the output classes—are an ongoing challenge, as is the selection of an optimal feature set. This research paper aims to achieve a higher defect prediction accuracy for multi-class CPDP by selecting an optimal feature set through XGBoost combined with an automatic feature extraction using a convolutional neural network (CNN). This research type is explanatory, and this research method is controlled experimentation, for which the independent variable prediction accuracy was dependent upon two variables, XGBoost and CNN. The Softmax layer was added to the output layers of the CNN classifier to classify the output into multiple classes. In our experimentation with CPDP, we selected all 28 versions of the multi-class, in which 11 versions were selected as the source projects, against which we predicted 28 target versions with an average AUC of 75.57%. We validated this research paper’s results through the Wilcoxon test. Therefore, after removing the noise, class imbalances, and the data distribution gap, and treating the PROMISE dataset as multi-class, the optimal features selected through XGBoost and classified through the CNN can substantially increase the prediction accuracy in CPDP as evident from our exploratory data analysis (EDA).

Full Text