An Unsupervised Software Fault Prediction Approach Using Threshold Derivation

Rakesh Kumar,Amrita Chaturvedi,Lakshmanan Kailasam

doi:10.1109/tr.2022.3151125

Abstract

Software fault prediction models help the software quality assurance team to manage the resources optimally during software maintenance. Most of the recently proposed fault prediction approaches are helpful on labeled datasets only. Recently, several threshold-based software fault prediction approaches have been proposed. However, these approaches do not incorporate the distribution of software metrics for metric threshold derivation; hence, they demonstrate poor performance. To fill this gap, we develop an automated fault prediction approach, namely threshold clustering labeling/threshold clustering labeling plus (TCLP), which does not need a labeled dataset. It can identify the faulty and nonfaulty artifacts on unlabeled datasets by self-learning. Our proposed approach is an extension of the state-of-the-art technique known as CLAMI. Unlike CLAMI, we derive the metrics threshold using logarithmic transformation. Thereafter, we label the instances into binary classes (faulty/nonfaulty) using the metric threshold values. TCLP extends this approach one step further by performing fault prediction using a random forest algorithm. The empirical evaluation of the proposed approach on 28 datasets (with the different number of metrics and granularity) collected from five software groups shows that the proposed unsupervised method obtains significantly better results than those of the state-of-the-art methods. The proposed approach impressively enhances the performance of CLAMI in terms of accuracy, F-measure, and Mathew’s correlation coefficient.

Full Text