An HMM-based approach for automatic detection and classification of duplicate bug reports

Neda Ebrahimi,Abdelwahab Hamou-Lhadj,Md Shariful Islam,Kobra Khanmohammadi,Abdelaziz Trabelsi

doi:10.1016/j.infsof.2019.05.007

Abstract

Abstract Context Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day. Objective The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates. Method While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models. Results When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k = 1 is 59%, for Rank k = 2 is 75.55%. We start reaching the 90% recall from k = 10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k = 1 is around 63%, while this value increases by about 10% for k = 2. The recall increases to 97% for k = 11. A MAP value of up to 73% is achieved. Conclusion We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.

Full Text