Abstract
Fixing software bugs can be colossally expensive, especially if they are discovered in the later phases of the software development life cycle. As such, bug prediction has been a classic problem for the research community. As of now, the Google Scholar site generates ∼113,000 hits if searched with the “bug prediction” phrase. Despite this staggering effort by the research community, bug prediction research is criticized for not being decisively adopted in practice. A significant problem of the existing research is the granularity level (i.e., class/file level) at which bug prediction is historically studied. Practitioners find it difficult and time-consuming to locate bugs at the class/file level granularity. Consequently, method-level bug prediction has become popular in the past decade. We ask, are these method-level bug prediction models ready for industry use? Unfortunately, the answer is no . The reported high accuracies of these models dwindle significantly if we evaluate them in different realistic time-sensitive contexts. It may seem hopeless at first, but, encouragingly, we show that future method-level bug prediction can be improved significantly. In general, we show how to reliably evaluate future method-level bug prediction models and how to improve them by focusing on four different improvement avenues: building noise-free bug data, addressing concept drift, selecting similar training projects, and developing a mixture of models. Our findings are based on three publicly available method-level bug datasets and a newly built bug dataset of 774,051 Java methods originating from 49 open-source software projects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Software Engineering and Methodology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.