Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning
This study introduces Mutualistic Neural Active Learning (MNAL), a cross-project framework that combines neural language models and active learning to automate bug report identification across GitHub repositories. MNAL significantly reduces human labeling effort—up to 95.8% in readability and 196.0% in identifiability—while achieving improved F1-scores, demonstrating enhanced efficiency and effectiveness in human-machine collaboration for software maintenance.
Bug reports, encompassing a wide range of bug types, are crucial for maintaining software quality. However, the increasing complexity and volume of bug reports pose a significant challenge in sole manual identification and assignment to the appropriate teams for resolution, as dealing with all the reports is time-consuming and resource-intensive. In this paper, we introduce a cross-project framework, dubbed Mutualistic Neural Active Learning ( MNAL ), designed for automated and more effective identification of bug reports from GitHub repositories boosted by human-machine collaboration. MNAL utilizes a neural language model that learns and generalizes reports across different projects, coupled with active learning to form neural active learning. A distinctive feature of MNAL is the purposely crafted mutualistic relation between the machine learners (neural language model) and human labelers (developers) when enriching the knowledge learned. That is, the most informative human-labeled reports and their corresponding pseudo-labeled ones are used to update the model while those reports that need to be labeled by developers are more readable and identifiable, thereby enhancing the human-machine teaming therein. We evaluate MNAL using a dataset of \(1,275,881\) reports from over \(127,000\) software projects against the state-of-the-art approaches, baselines, and different variants. The results indicate that, remarkably, MNAL achieves up to 95.8% and 196.0% effort reduction in terms of readability and identifiability during human labeling, respectively, while resulting in a better performance (e.g., F1-score) in bug report identification. Additionally, our MNAL is model-agnostic since it is capable of improving the model performance with various underlying neural language models. To further verify the efficacy of our approach, we conducted a qualitative case study involving 10 human participants, who rate MNAL as being more effective while saving more time and monetary resources. The dataset and code are made publicly available at https://github.com/ideas-labo/MNAL .
- Research Article
9
- 10.1016/j.infsof.2022.106899
- Jul 1, 2022
- Information and Software Technology
Locality-based security bug report identification via active learning
- Book Chapter
9
- 10.1007/978-3-319-39225-7_16
- Jan 1, 2016
Issue tracking systems are used, in most software projects, but in particular in almost all free open source software, to record many different kinds of issues: bug reports, feature requests, maintenance tickets and even design discussions. Identifying which of those issues are bug reports is not a trivial task. When researchers want to conduct studies on the bug reports, managed by a software development project, first of all they need to perform this identification.
- Book Chapter
1
- 10.1007/978-981-13-2922-7_23
- Jan 1, 2018
Many software projects use bug tracking systems to collect and allocate the bug reports, but the priority assignment tasks become difficult to be completed because of the increasing bug reports. In order to assist developers to reduce the pressure on assigning the priority for each bug report, we propose an integration method to predict priority levels based on machine learning. Our approach considers the textual description in bug reports as features and feeds these features to three different classifiers. We utilize these classifiers to predict the bug reports with unknown type and obtain three different results. Simultaneously, we set weights to balance the abilities of identifying different categories based on the characteristics of different projects for each classifier. Finally, we utilize the weights to adjust prediction results and produce a unique priority for assigning to each bug reports. We perform experiments on datasets from 4 products in Mozilla and the experimental results show that our approach has a better performance in terms of identifying the priority of bug reports than previous general methods and ensemble methods.
- Conference Article
3
- 10.1109/iccke57176.2022.9960061
- Nov 17, 2022
Bug assignment, which routes software projects’ bug reports to the appropriate fixers, is an important part of the development of software and its maintenance. Manual bug assignment is a time-consuming process that delays debugging. So various machine learning and information retrieval approaches have been used in order to automate the process of bug assigning. However, Most previous deep learning-based studies have focused on developers assigned to bug reports and have not specifically considered developers’ collaboration and interaction to resolve bug reports. Here, we propose a novel approach for automatic bug assignment based on Bidirectional Encoder Representations from Transformers (BERT) and Preference Neural Network (PNN). First, we preprocess the textual data in the bug reports. Second, we use BERT as a word embedding technique to get vector representation of bug reports. Third, we calculate the developers’ suitability score based on different developers’ activity features for each bug report. Finally, PNN is used to rank developers for each bug report. Experiments are performed on open-source projects, namely Eclipse UI, Birt, JDT and SWT, and top-k accuracy is measured as an evaluation metric. The results obtained from our experiments can prove that our approach could markedly improve the performance of automatic bug assignment.
- Conference Article
21
- 10.1145/3275219.3275228
- Sep 16, 2018
Routing the bug reports to potential fixers (i.e., bug triaging), is an integral step in software development and maintenance. However, manually inspecting and assigning bug reports is tedious and time-consuming, especially in those software projects that have a large amount of bug reports and developers. To make bug triaging more efficient, many machine learning and information retrieval based approaches have been proposed to automatically assign bug reports for suitable developers to fix. However, these techniques typically ignore two important facts in bug fixing. First, for some bug reports, the bug reporter himself/herself is one of the developers in the project, and he/she is likely to fix his/her reported bugs in the future. Second, for some bug reports, there may be a tossing sequence which contains several developers from the first potential fixer to the last actual fixer. Such tossing sequences encode valuable information such as the dependency of developers for the bug triaging task. To make use of the above facts, we propose a sequence to sequence model named SeqTriage to automatically route a given bug report to its responsible fixer. Evaluation results on three different open-source projects show that the proposed approach has significantly improved the accuracy of bug triaging compared with the state-of-the-art approaches (20% at best and 5% at least).
- Research Article
105
- 10.1016/j.infsof.2021.106530
- Jan 20, 2021
- Information and Software Technology
Improving high-impact bug report prediction with combination of interactive machine learning and active learning
- Conference Article
55
- 10.1145/2480362.2480568
- Mar 18, 2013
With a great number of software applications that have been developed, software maintenance has become an important and challenging task, particularly due to the increasing scale of software projects. Even if developers can create and update bug reports in bug repositories to support software maintenance, a large software project receives a large number of bug reports each day. For reducing the workload of developers, many researchers and software engineers have begun recommending appropriate developers to fix bugs. This process is called bug triage and is a hot research topic for software maintenance. In this paper, we propose a hybrid bug triage algorithm, combining a probability model and an experience model to rank all candidate developers for fixing a new bug. For this study, we adopted the smoothed Unigram Model (UM) instead of the traditional Vector Space Model (VSM) to search similar bug reports. In the probability model, we used a social network to analyze the probability of fixing a new bug for a candidate developer. We first proposed to add a new feature (the number of re-opened bugs) in order to get the fixing probability. In the experience model, we considered the number of fixed bugs and fixing cost for each candidate developer as the estimate factor. In addition, we introduced a new concept, activity factor, to better model developers' experience. We performed the experiments on two large-scale, open source projects. The results show that our method can effectively recommend the best developer for fixing bugs.
- Conference Article
10
- 10.1109/iciev.2018.8641045
- Jun 1, 2018
Nowadays, software projects receive a huge number of bug reports daily. Among them, security and performance bug reports are higher priority to software developers and users. So, rapid identification of security and performance bug reports as soon as these are reported is mandatory. But bug tracking systems do not provide any mechanism to isolate them from the collection of bug reports. In this paper, we have proposed a learning based approach to identify security and performance bug reports addressing class-bias and feature-skew phenomenon. We have proposed two separate classification models namely Sec-Model and Perf-Model, where the former classifies a bug report as security or non-security bug report and the latter classifies as performance or non-performance bug report. We have experimented our approach on four datasets of bug reports of four software projects- Ambari, Camel, Derby and Wicket. We have evaluated the performance of our two models in terms of area under curve receiver operating characteristics curve (AUC). The average AUC values of Sec-Model and Perf-Model are 0.67 and 0.71 respectively.
- Conference Article
23
- 10.1109/apsec.2018.00097
- Dec 1, 2018
Background: Automated bug localization in large amounts of source files for bug reports is a crucial task in software engineering. However, the different representations of bug reports and source files limited the accuracy of the existing bug localization techniques. Aims: We propose a novel deep learning-based model to improve the accuracy of bug localization for bug reports by expressing them in character and analyzing them with a language model. Method: The proposed model is composed of two main parts: character-level convolutional neural network (CNN) and recurrent neural network (RNN) language model. Both bug reports and source files are expressed in a character level and then input into a CNN, whose output is given to an RNN encoder-decoder architecture. Results: The results of preliminary experiments show that the proposed model achieves comparable or even higher accuracy than the existing machine translation-based bug localization technique. Conclusion: The proposed model is capable of automatically localizing buggy files for bug reports and achieves better accuracy by analyzing them in character level where both bug reports and source code can be expressed.
- Research Article
2
- 10.1016/j.infsof.2025.107778
- Sep 1, 2025
- Information and Software Technology
Identifying security bugs in software is critical to minimize vulnerability windows. Traditionally, bug reports are submitted through issue trackers and manually analyzed, which is time-consuming. Challenges such as data scarcity and imbalance generally hinder the development of effective machine learning models that could be used to automate this task. Generative Pre-trained Transformer (GPT) models do not require training and are less affected by the imbalance problem. Therefore, they have gained popularity for various text-based classification tasks, apparently becoming a natural highly promising solution for this problem. This paper explores the potential of using GPT models to identify security bug reports from the perspective of a user of this type of models. We aim to assess their classification performance in this task compared to traditional machine learning (ML) methods, while also investigating how different factors, such as the prompt used and datasets’ characteristics, affect their results. We evaluate the performance of four state-of-the-art GPT models (i.e., GPT4All-Falcon, Wizard, Instruct, OpenOrca) on the task of security bug report identification. We use three different prompts for each GPT model and compare the results with traditional ML models. The empirical results are based on using bug report data from seven projects (i.e., Ambari, Camel, Derby, Wicket, Nova, OpenStack, and Ubuntu). GPT models show noticeable difficulties in identifying security bug reports, with performance levels generally lower than traditional ML models. The effectiveness of the GPT models is quite variable, depending on the specific model and prompt used, as well as the particular dataset. Although GPT models are nowadays used in many types of tasks, including classification, their current performance in security bug report identification is surprisingly insufficient and inferior to traditional ML models. Further research is needed to address the challenges identified in this paper in order to effectively apply GPT models to this particular domain.
- Conference Article
5
- 10.1109/ubmk55850.2022.9919454
- Sep 14, 2022
In software projects, bug reports remain open and get updates from team members during the related bug's lifetime. It is an important task to predict when a bug would be resolved so that managers plan timeline and allocate team resources accordingly. Prior works show that reporter information is the most effective feature for predicting resolution time. Previous work only considers bug reporting activity to define reporter reputation and misses other activities. In this paper, we propose a new reputation calculation method that considers all activities of a reporter within a bug tracking system. We collected bug reports of Chromium and WebRTC projects and calculated reputations of each bug reporter within the dataset using our weighted, activity based calculation method as well as other methods from previous work in order to compare performance. We trained a Doc2Vec model to utilize textual information in bug reports to build a base model for comparing different reputation methods. Bug reports are classified into two categories as FAST and SLOW according to their resolution time. Stochastic Gradient Descent (SGD) and Extreme Gradient Boost (XGB) classifier algorithms are employed. XGB classifier resulted with 55 % - 72 % F -Scores for FAST and SLOW respectively. The results also show that data characteristics of project affects the effectiveness of the reputation calculation method on bug resolution time prediction.
- Conference Article
16
- 10.1109/icsme.2014.66
- Sep 1, 2014
Software bug reports are important project artifacts that evolve throughout the life of a software project. Software bugs are issues that are reported by users when these issues hinder their work. Software projects evolve over time as bugs are addressed and new features are added. Managing bugs can be a significant challenge as a project manager generally needs to be aware of all the bug reports for the current version, and this can be even more challenging when the number of bug reports becomes large. It is preferable that a developer new to a project improves her knowledge with the project along with the bug reports during working on it, which is likely to help her avoid or handle the reported issues. In this paper, we propose a prototype that assists developers review a project's bug reports by interactively visualizing insightful information regarding the bug reports using topic analysis. In addition, in order to reduce developers' time and efforts when studying a bug report, the proposed prototype also provides an extractive summary visualization of each bug report. In this research, it is shown that our proposed prototype performs better in terms of precision, recall, and F-measure than a baseline approach that uses time-sensitive keyword extraction.
- Book Chapter
- 10.1007/978-981-10-6496-8_27
- Sep 21, 2017
Deep neural network language model has gained significant development among natural language processing (NLP) in recent years. In this paper, we focused on using neural language model (NNLM) to enhance microblog search. This paper proposed a microblog search method based on neural network language model (NBSM). Firstly, we train neural network language model based on microblog data, so as to get the distributed representation of words which may contain internal express model of microblog. Then, we use the distributed representation of words to get the expanding words of users’ searching words. Finally, we re-rank microblog search results combining deep sematic text similarity and social signal features. The method we proposed can effectively obtain microblog express model, and its search result can reflect the social hot-topics of the topic related to users searching words. Experiment results show that the proposed method yields significant improvements over state-of-arts methods and significantly improves the user’s search experience.
- Conference Article
16
- 10.1145/3377816.3381738
- Jun 27, 2020
Bug severity is an important factor in prioritizing which bugs to fix first. The process of triaging bug reports and assigning a severity requires developer expertise and knowledge of the underlying software. Methods to automate the assignment of bug severity have been developed to reduce the developer cost, however, many of these methods require 70-90% of the project’s bug reports as training data and delay their use until later in the development process. Not being able to automatically predict a bug report’s severity early in a project can greatly reduce the benefits of automation. We have developed a new bug report severity prediction method that leverages how bug reports are written rather than what the bug reports contain. Our method allows for the prediction of bug severity at the beginning of the project by using an organization’s historical data, in the form of bug reports from past projects, to train the prediction classier. In validating our approach, we conducted over 1000 experiments on a dataset of five NASA robotic mission software projects. Our results demonstrate that our method was not only able to predict the severity of bugs earlier in development, but it was also able to outperform an existing keyword-based classifier for a majority of the NASA projects.Ccs Concepts• Software and its engineering → Software maintenance tools; Maintaining software; Software testing and debugging; • Computing methodologies → Machine learning.
- Conference Article
119
- 10.1109/csmr.2012.48
- Mar 1, 2012
Bugs are prevalent in software systems. To improve the reliability of software systems, developers often allow end users to provide feedback on bugs that they encounter. Users could perform this by sending a bug report in a bug report management system like Bugzilla. This process however is uncoordinated and distributed, which means that many users could submit bug reports reporting the same problem. These are referred to as duplicate bug reports. The existence of many duplicate bug reports may cause much unnecessary manual efforts as often a triager would need to manually tag bug reports as being duplicates. Recently, there have been a number of studies that investigate duplicate bug report problem which in effect answer the following question: given a new bug report, retrieve k other similar bug reports. This, however, still requires substantive manual effort which could be reduced further. Jalbert and Weimer are the first to introduce the direct detection of duplicate bug reports, it answers the question: given a new bug report, classify if it as a duplicate bug report or not. In this paper, we extend Jalbert and Weimer's work by improving the accuracy of automated duplicate bug report identification. We experiments with bug reports from Mozilla bug tracking system which were reported between February 2005 to October 2005, and find that we could improve the accuracy of the previous approach by about 160%.