Abstract

Cross-project defect prediction (CPDP) is a practical approach for finding software defects in projects which have incomplete or fewer data. Improvements to the defect prediction accuracy of CPDP—such as the PROMISE repository, the correct classification of the source data, removing the noise, reducing the distribution gap, and balancing the output classes—are an ongoing challenge, as is the selection of an optimal feature set. This research paper aims to achieve a higher defect prediction accuracy for multi-class CPDP by selecting an optimal feature set through XGBoost combined with an automatic feature extraction using a convolutional neural network (CNN). This research type is explanatory, and this research method is controlled experimentation, for which the independent variable prediction accuracy was dependent upon two variables, XGBoost and CNN. The Softmax layer was added to the output layers of the CNN classifier to classify the output into multiple classes. In our experimentation with CPDP, we selected all 28 versions of the multi-class, in which 11 versions were selected as the source projects, against which we predicted 28 target versions with an average AUC of 75.57%. We validated this research paper’s results through the Wilcoxon test. Therefore, after removing the noise, class imbalances, and the data distribution gap, and treating the PROMISE dataset as multi-class, the optimal features selected through XGBoost and classified through the CNN can substantially increase the prediction accuracy in CPDP as evident from our exploratory data analysis (EDA).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call