A method of non-bug report identification from bug report repository

Jantima Polpinij

doi:10.1007/s10015-021-00681-3

Abstract

One of the most common issues addressed by bug report studies is misclassification when identifying and then filtering non-bug reports from the bug report repository. Having to filter out unrelated reports wastes time in identifying actual bug reports, and this escalates costs as extra maintenance and effort are required to triage and fix bugs. Therefore, this issue has been seriously studied and is addressed here. To tackle this problem, this study proposes a method of automatically identifying non-bug reports in the bug report repository using classification techniques. Three points are considered here. First, the bug report features used are unigram and CamelCase, where CamelCase words are used for feature expansion. Second, five term weighting schemes are compared to determine an appropriate term weighting scheme for this task. Lastly, the support vector machine (SVM) family i.e. binary-class SVM, one class SVM based on Scholkopf methodology and support vector data description (SVDD) are used as the main mechanisms for modeling non-bug report identifiers. After testing by recall, precision, and F1, the results demonstrate the efficiency of identifying non-bug reports in the bug report repository. Our results may be acceptable after comparing to the previous well-known studies, and the performance of non-bug report identifiers with tf-igm and modified tf-icf weighting schemes for both Scolkopf methodology and SVDD methods yielded the best value when compared to others.

Full Text