CaPBug-A Framework for Automatic Bug Categorization and Prioritization Using NLP and Machine Learning Algorithms

Hafiza Anisa Ahmed,Jawwad Ahmed Shamsi,Narmeen Zakaria Bawany

doi:10.1109/access.2021.3069248

Hafiza Anisa Ahmed, Jawwad Ahmed Shamsi + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3069248

Copy DOI

Abstract

Bug reports facilitate software development teams in improving the quality of software. These reports include significant information related to problems encountered within a software, possible enhancement suggestions, and other potential issues. Bug reports are typically complex and are too detailed; hence a lot of resources are required to analyze and process them manually. Moreover, it leads to delays in the resolution of high priority bugs. Accurate and timely processing of bug reports based on their category and priority plays a significant role in improving the quality of software maintenance. Therefore, an automated process of categorization and prioritization of bug reports is needed to address the aforementioned issues. Automated categorization and prioritization of bug reports have been explored recently by many researchers; however, limited progress has been made in this regard. In this research, we present a novel framework, titled CaPBug, for automated categorization and prioritization of bug reports. The framework is implemented using Natural Language Processing (NLP) and supervised machine learning algorithms. A baseline corpus is built with six categories and five prioritization levels by analyzing more than 2000 bug reports of Mozilla and Eclipse repository. Four classification algorithms i.e., Naive Bayes, Random Forest, Decision Tree, and Logistic Regression have been used to categorize and prioritize bug reports. We demonstrate that the CaPBug framework achieved an accuracy of 88.78% by using a Random Forest classifier with a textual feature for predicting the category. Similarly, using the CaPBug framework, an accuracy of 90.43% was achieved in predicting the priority of bug reports. Synthetic Minority Over-Sampling Technique (SMOTE) has been applied to address the class imbalance issue in priority classes.

Highlights

Software testing and maintenance are the most critical phases of software development
1) CATEGORY WISE RESULTS FROM TEXTUAL FEATURE TABLE 9 presents the accuracy of classification algorithms for each category of bug reports that have been predicted from a textual feature
Using Naive Bayes, Decision Tree and Random Forest classifiers, the P4 priority level acquired the accuracy of 84% to 91%, whereas, it achieved the lowest accuracy of 6.06% with Logistic Regression

Summary

Introduction

Software testing and maintenance are the most critical phases of software development. Bug reports play a vital role in these stages of development activities [1], [2]. A bug report is generated by the software quality assurance team while testing software modules. It contains detailed information about a specific component or problem that is needed to be fixed [3]–[5]. The information in a bug report includes many attributes such as feature request, functionality enhancement request, code errors, logical errors, and compatibility issues. The report consists of several headings including priority, summary, description of the affected component, The associate editor coordinating the review of this manuscript and approving it for publication was Ikramullah Lali

Objectives

Results

Conclusion