An extensive evaluation of ensemble techniques for software change prediction

Gemma Catolino,Filomena Ferrucci,Guest Editors Maltesque

doi:10.1002/smr.2156

Abstract

AbstractPredicting the areas of the source code having a higher likelihood to change in the future represents an important activity to allow developers to plan preventive maintenance operations. For this reason, several change prediction models have been proposed. Moreover, research community demonstrated how different classifiers impact on the performance of devised models as well as classifiers tend to perform similarly even though they are able to correctly predict the change proneness of different code elements, possibly indicating the presence of some complementarity among them. In this paper, we deeper investigated whether the use of ensemble approaches, ie, machine learning techniques able to combine multiple classifiers, can improve the performances of change prediction models. Specifically, we built three change prediction models based on different predictors, ie, product‐, process‐ metrics‐, and developer‐related factors, comparing the performances of four ensemble techniques (ie, Boosting, Random Forest, Bagging, and Voting) with those of standard machine learning classifiers (ie, Logistic Regression, Naive Bayes, Simple Logistic, and Multilayer Perceptron). The study was conducted on 33 releases of 10 open‐source systems, and the results showed how ensemble methods and in particular Random Forest provide a significant improvement of more than 10% in terms of F measure. Indeed, the statistical analyses conducted confirm the superiority of this ensemble technique. Moreover, the model built using developer‐related factors performed better than the other models that exploit product and process metrics and achieves an overall median of F measure around 77%.

Full Text