On the prediction of long-lived bugs: An analysis and comparative study using FLOSS projects

Luiz Alberto Ferreira Gomes,Mario Lúcio Côrtes,Ricardo Da Silva Torres

doi:10.1016/j.infsof.2020.106508

Abstract

Software evolution and maintenance activities in today’s Free/Libre Open Source Software (FLOSS) rely primarily on information extracted from bug reports registered in bug tracking systems. Many studies point out that most bugs that adversely affect the user’s experience across versions of FLOSS projects are long-lived bugs. However, proposed approaches that support bug fixing procedures do not consider the real-world lifecycle of a bug, in which bugs are often fixed very fast. This may lead to useless efforts to automate the bug management process. This study aims to confirm whether the number of long-lived bugs is significantly high in popular open-source projects and to characterize the population of long-lived bugs by considering the attributes of bug reports. We also aim to conduct a comparative study evaluating the prediction accuracy of five well-known machine learning algorithms and text mining techniques in the task of predicting long-lived bugs. We collected bug reports from six popular open-source projects repositories (Eclipse, Freedesktop, Gnome, GCC, Mozilla, and WineHQ) and used the following machine learning algorithms to predict long-lived bugs: K-Nearest Neighbor, Naïve Bayes, Neural Networks, Random Forest, and Support Vector Machines. Our results show that long-lived bugs are relatively frequent (varying from 7.2% to 40.7%) and have unique characteristics, confirming the need to study solutions to support bug fixing management. We found that the Neural Network classifier yielded the best results in comparison to the other algorithms evaluated. Research efforts regarding long-lived bugs are needed and our results demonstrate that it is possible to predict long-lived bugs with a high accuracy (around 70.7%) despite the use of simple prediction algorithms and text mining methods.

Full Text