Combining Imbalance Learning Strategy and Multiclassifier Estimator for Bug Report Classification

Shikai Guo,Miaomiao Wei,Hui Li,Rong Chen,Chen Guo,Siwen Wang

doi:10.1155/2020/5712461

Shikai Guo, Miaomiao Wei + Show 4 more

Open Access

https://doi.org/10.1155/2020/5712461

Copy DOI

Abstract

Since a large number of bug reports are submitted to the bug repository every day, efficiently assigning bug reports to the correct developer is a considerable challenge. Because of the large differences between the different components of different projects, the current bug classification mainly relies on the components of the bug report to dispatch bug reports to the designated developer or developer community. Unfortunately, the component information of the bug report is filled in by default according to the bug submitter and the result is often incorrect. Thus, an automatic technology that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. In this paper, we propose a method based on the combination of imbalanced learning strategies such as random undersampling (RUS), random oversampling (ROS), synthetic minority oversampling technique (SMOTE), and AdaCost algorithms with multiclass classification methods, OVO and OVA, to solve bug reports component classification problem. We investigate the effectiveness of different combinations, i.e., variants, each of which includes a specific imbalance learning strategy and a specific classification algorithm. We mainly perform an analytical study on five open bug repositories (Eclipse, Mozilla, GCC, OpenOffice, and NetBeans). The results show that different variants have different performance for bug reports component identification and the best performance variants are combined with the imbalanced learning strategy RUS and the OVA method based on the SVM classifier.

Highlights

We propose a method based on the combination of imbalanced learning strategies such as random undersampling (RUS), random oversampling (ROS), synthetic minority oversampling technique (SMOTE), and AdaCost algorithms with multiclass classification methods, OVO and OVA, to solve bug reports component classification problem
RQ1: Which Classification Method Has Better Classification Effect Based on naive Bayes multinomial (NBM), k-nearest neighbor algorithm (KNN), and support-vector machine (SVM) Classifiers’ OVO and OVA Methods? To answer this question, we use the OVO and OVA multiclass classification methods based on NBM, KNN, and SVM classifiers, which together contain six variants (i.e., OVO method-based NBM classifier, OVO method-based KNN classifier, OVO method-based SVM classifier, OVA method-based NBM classifier, OVA method-based KNN classifier, and OVA method-based SVM classifier) and record the experimental results
RQ2: What Is the Impact of Imbalanced Learning Strategies on the Multiclass Classification OVO Method in Solving Bug Reports Component Allocation Problems? the question explores whether an imbalanced learning strategy has an impact on the OVO classification method

Summary

Introduction

Existing work uses text-based classification methods to assist in bug classification, for example, [26,27,28,29]. Existing work uses a text-based classification approach to assist in preventing misclassification in recommending the correct developer. In such an approach, the summary and description of the bug report are extracted as textual content and the developer who can fix the bug is marked as a label for classification. Since the number of bug reports submitted to the bug repository is very large, during the bug classification process, developers resolve as many bug reports with a high degree of impact and severity as possible. A number of prediction methods for bug reporting severity labels have been proposed

Methods

Results

Conclusion