Feature Comparison for Automatic Bug Report Classification

Bancha Luaphol,Tontrakant Kachai,Boonchoo Srikudkao,Poramin Bheganan,Jantima Polpinij,Natthakit Srikanjanapert

doi:10.1007/978-3-030-19861-9_7

Abstract

Nowadays, various bug tracking systems (BTS) such as Jira, Trace, and Bugzilla have been developed and proposed to gather the issues from users worldwide. This is because those issues, called bug reports, contain a significant information for software quality maintenance and improvement. However, many bug reports with poor quality might have been submitted to the BTS. In general, the reported bugs in the BTS are firstly analyzed and filtered out by bug triagers. However, with the increasing amount of bug reports in the BTS, manually classifying bug reports is a time-consuming task. To address this problem, automatically distinguishing of bugs and non-bugs is necessary. To the best of our knowledge, this task is never easy for bug reports classification because the problem of bug reports misclassification still occurs to date. The background of this problem may be arise from using inappropriate or confusing features. Therefore, this work aims to study and discover the most proper features for binary bug report classification. This study compares seven features such as unigram, bigram, camel case, unigram+bigram, unigram+camel case, bigram+ camel case, and all features together. The experimental results show that the unigram+camel case should be the most proper features for binary bug report classification, especially when using with the logistic regression algorithm. Consequently, the unigram+camel case should be the proper feature to distinguish bug reports from the non-bugs ones.

Full Text