Commit-time defect prediction using one-class classification

Mohammed A Shehab,Wael Khreich,Abdelwahab Hamou-Lhadj,Issam Sedki

doi:10.1016/j.jss.2023.111914

Abstract

Existing Just-In-Time Software Defect Prediction methods suffer from the data imbalance problem, where the majority class (normal commits) significantly outnumbers the minority class (buggy commits). This results in a higher probability of misclassification. Various data balancing techniques have been proposed to address this challenge with varying degrees of success. In this study, we propose an approach that rely on One-Class Classification (OCC) to train models using data from the majority class only. This eliminates the need for data balancing. We compare the accuracy of three OCC algorithms - One-class SVM, Isolation Forest, and One-class k-NN - to their binary counterparts - SVM, Random Forest, and k-NN - on 34 software projects. Our results show that the data imbalance ratio (the proportion of normal to buggy commits) plays a crucial role in determining the optimal classification approach. We found that for projects with medium to high imbalance ratio, OCC algorithms outperform binary classifiers with and without data balancing, using cross and time-sensitive validation approaches. Furthermore, we found that OCC methods require fewer features for projects with medium to high IR, reducing the computational overhead of training and response time while providing a better understanding of the data and algorithm behavior.

Full Text