Abstract

<i>Context:</i> Users and developers use bug tracking systems to report errors that occur during the development and testing of software. The manual identification of duplicates is a tedious task especially with software that have large bug repositories. In this context, their automatic detection becomes a necessary task that can help prevent frequently fixing the same bug. <i>Objective:</i> In this article, we propose <i>BERT-MLP</i>, a novel pretrained language model using bidirectional encoder representations from ransformers (BERT) for duplicate bug report detection (DBRD) with the aim of improving the detection rate compared to existing works. <i>Method:</i> Our approach considers only unstructured data. These are fed into the BERT model in order to learn the contextual relationships between words. The output is fed into a multilayer perceptron (MLP) classifier, representing our base DBRD. <i>Results:</i> Our approach was evaluated on three projects: Mozilla Firefox, Eclipse Platform, and Thunderbird. It achieved an accuracy of 92.11, 94.08, and 89.03&#x0025;, respectively, for Mozilla, Eclipse, and Thunderbird. A comparison with a dual-channel convolutional neural network (DC-CNN) model and other pretrained models, including RoBERTa and Sentence-Bert has been conducted. Results showed that <i>BERT-MLP</i> outperformed, the second best performing models (DC-CNN and Sentence-BERT) by 12&#x0025; in accuracy for Eclipse and 9&#x0025; for both Mozilla and Thunderbird, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call