Application of NLP to determine the State of Issues in Bug Tracking Systems

Matthias Pohl,Frederik Kramer,Ali Hashaam,Sascha Bosse,Matthias Volk,Klaus Turowski,Daniel Gunnar Staegemann

doi:10.1109/icdmw51313.2020.00017

Abstract

The amount of semi-structured textual data being generated all over the world wide web has placed the limelight on the fields of natural language processing and text mining. With the continuous growth of IT companies, there are a lot of new and diverse projects getting introduced. Hence, the need for project management tools that provide robust, efficient, and forthcoming services is rising. A bug tracking system is one of such systems that help in making the job of every stakeholder easier. This research utilizes the textual data provided by bug tracking systems in the form of problem summaries, descriptions, and comments to determine the state of project issues. The unlabeled textual data from the bug tracking system is subjected to a state of the art class-expansion stage, that automates the process of labeling issues based on severity level and clustering (constraint-based and density-based). During this stage, 40 percent of the total data could be labeled. Semi-supervised learners are introduced to consider labeled as well as unlabeled data for learning, and the results significantly outclass the results of a supervised learner. However, for the semi-supervised learners, the recall for the critical class is still not desirable at 53 percent. Since identifying critical tickets is of utmost importance for project managers, it becomes essential to improve the recall for the critical class. In order to deal with this, sentiment analysis' insights are introduced into the semi-supervised learners. As a result, the recall for the critical class improves up to 85 percent, which makes the proposed solution useful in cases where efficient classification among critical and non-critical classes is required.

Full Text