Predicting failure-proneness in an evolving software product line

Sandeep Krishnan,Chris Strasburg,Robyn R Lutz,Katerina Goseva-Popstojanova,Karin S Dorman

doi:10.1016/j.infsof.2012.11.008

Abstract

ContextPrevious work by researchers on 3years of early data for an Eclipse product has identified some predictors of failure-prone files that work well. Eclipse has also been used previously by researchers to study characteristics of product line software. ObjectiveThe work reported here investigates whether classification-based prediction of failure-prone files improves as the product line evolves. MethodThis investigation first repeats, to the extent possible, the previous study and then extends it by including four more recent years of data, comparing the prominent predictors with the previous results. The research then looks at the data for three additional Eclipse products as they evolve over time. The analysis compares results from three different types of datasets with alternative data collection and prediction periods. ResultsOur experiments with a variety of learners show that the difference between the performance of J48, used in this work, and the other top learners is not statistically significant. Furthermore, new results show that the effectiveness of classification significantly depends on the data collection period and prediction period. The study identifies change metrics that are prominent predictors across all four releases of all four products in the product line for the three different types of datasets. From the product line perspective, prediction of failure-prone files for the four products studied in the Eclipse product line shows statistically significant improvement in accuracy but not in recall across releases. ConclusionAs the product line matures, the learner performance improves significantly for two of the three datasets, but not for prediction of post-release failure-prone files using only pre-release change data. This suggests that it may be difficult to detect failure-prone files in the evolving product line. At least in part, this may be due to the continuous change, even for commonalities and high-reuse variation components, which we previously have shown to exist.

Full Text