An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes

Lov Kumar,Sanjay Misra,Santanu Ku Rath

doi:10.1016/j.csi.2017.02.003

Abstract

Software fault prediction models are used to predict faulty modules at the very early stage of software development life cycle. Predicting fault proneness using source code metrics is an area that has attracted several researchers' attention. The performance of a model to assess fault proneness depends on the source code metrics which are considered as the input for the model. In this work, we have proposed a framework to validate the source code metrics and identify a suitable set of source code metrics with the aim to reduce irrelevant features and improve the performance of the fault prediction model. Initially, we applied a t-test analysis and univariate logistic regression analysis to each source code metric to evaluate their potential for predicting fault proneness. Next, we performed a correlation analysis and multivariate linear regression stepwise forward selection to find the right set of source code metrics for fault prediction. The obtained set of source code metrics are considered as the input to develop a fault prediction model using a neural network with five different training algorithms and three different ensemble methods. The effectiveness of the developed fault prediction models are evaluated using a proposed cost evaluation framework. We performed experiments on fifty six Open Source Java projects. The experimental results reveal that the model developed by considering the selected set of source code metrics using the suggested source code metrics validation framework as the input achieves better results compared to all other metrics. The experimental results also demonstrate that the fault prediction model is best suitable for projects with faulty classes less than the threshold value depending on fault identification efficiency (low – 48.89%, median- 39.26%, and high – 27.86%).

Full Text