Investigating fault prediction capabilities of five prediction models for software quality

Deepak Banthia,Atul Gupta

doi:10.1145/2245276.2231975

Abstract

Predicting faults in software modules can lead to a high quality and more effective software development process to follow. However, the results of a fault prediction model have to be properly interpreted before incorporating them into any decision making. Most of the earlier studies have used the prediction accuracy as the main criteria to compare amongst competing fault prediction models. However, we show that besides accuracy, other criteria like number of false positives and false negatives can equally be important to choose a candidate model for fault prediction. We have used five NASA software data sets in our experiment. Our results suggest that the performance of Simple Logistic is better than the others on raw data sets whereas the performance of Neural Network was found to be better when we applied dimensionality reduction method on raw data sets. When we used data pre-processing techniques, the prediction accuracy of Random Forest was found to be better in both cases i.e. with and without dimensionality reduction but reliability of Simple Logistic was better than Random Forest because it had less number of fault negatives.

Full Text