Leveraging developer information for efficient effort-aware bug prediction

Yu Qu,Jianlei Chi,Heng Yin

doi:10.1016/j.infsof.2021.106605

Yu Qu, Jianlei Chi + Show 1 more

Open Access

PDF Available

https://doi.org/10.1016/j.infsof.2021.106605

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Software bug prediction techniques can provide informative guidance in software engineering practices. Over the past 15 years, developer information has been intensively used in bug prediction as features or basic data source to construct other useful models. Further leverage developer information from a new and straightforward perspective to improve effort-aware bug prediction. We propose to investigate the direct relations between the number of developers and the probability for a file to be buggy. Based on an empirical study on nine open-source Java systems with 32 versions, we observe a widely-existed and interesting tendency: when there are more developers working on a source file, there will be a stronger possibility for this file to be buggy. Based on the observed tendency, we propose an unsupervised algorithm and a supervised equation both called top-dev to improve effort-aware bug prediction. The key idea is to prioritize the ranking of files, whose number of developers is large, in the suspicious file list generated by effort-aware models. Experimental results show that the proposed top-dev algorithm and equation significantly outperform the unsupervised and supervised baseline models (ManualUp, Rad, Rdd, Ree, CBS+, and top-core). Moreover, the unsupervised top-dev algorithm is comparable or superior to existing supervised baseline models. The proposed approaches are very useful in effort-aware bug prediction practices. Practitioners can use the top-dev algorithm to generate a high-quality and informative suspicious file list without training complex machine learning classifiers. On the other hand, when building supervised bug prediction model, the best practice is to combine existing models with the top-dev equation.

Full Text