Abstract

Currently, the cost to detect and solve software defects is a heavy burden on software projects. So, it is significant to predict software defects at the earlier stages of the software development lifecycle. In this study, seven commonly-used machine learning and deep learning algorithms were studied and the performance of defect classification on 4 representative public datasets from NASA and the PROMISE repository was demonstrated. Furthermore, three classical ensemble learning methods (Bagging, Boosting, and Stacking) were used to improve the prediction performance. Six metrics, including accuracy, precision, f1-score, recall, the area under the receiver operating characteristic curve (AUC), and G-Mean were utilized to evaluate the performance. It was noted that ensemble learning exceeded all the other seven algorithms. Ensemble learning achieved the highest AUC of 0.99, the highest G-Mean of 0.96, and an average F1-score of 0.97. Under a time-sensitive scenario, the Boosting method was a good choice as it spent less runtime and had a similar performance to the other two ensemble learning methods in most cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call