Abstract

Software defect prediction constitutes an important discipline in software development life-cycle. Among the techniques employed in this domain, Naive Bayes (NB) classifier is cited by a large number of researchers for its simple structure and remarkable classification performance notwithstanding the concern of whether it is theoretically justified or not. More concisely, NB is fundamentally built on the strong assumption of conditional independence of attributes, and the major question here is the compliance of software metrics with this assumption. To address this question, we propose a novel framework “MLMNB-SDP” equipped with a statistical hypothesis testing method to detect those software metrics with a significant conditional dependency. MLMNB-SDP is designed to handle conditional dependencies via a single latent variable in a predefined structure which is responsible for preserving the connection between pairs of software metrics when the class variables are instantiated. We evaluate the effectiveness of our approach based on its capability to measure conditional dependency of software metrics and defect prediction performance. For the former one, we employ Conditional Mutual Information (CMI), and for the later one we use three settings for defect prediction; (1) Within-Project Defect-Prediction (WPDP), (2) Cross-Project Defect-Prediction (CPDP), and (3) stratified k-fold cross-validation. Our metrics dependency analysis results indicate that traditional file-level software metrics demonstrate a significant conditional mutual dependency and the application of naive Bayes classifier in this domain is not theoretically acceptable. Our results based on the three settings indicate that MLMNB-SDP improves naive Bayes classifier 5.45% to 75.86% and outperforms well-known benchmark classifiers, i.e., Random Forest and Logistic Regression, regarding a significant increase in Precision, Recall, and F1 Score, Mathew’s Correlation Coefficient (MCC), and area under the ROC curve (AUC) values.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call