Deep mining of open source software bug repositories

Abeer Hamdy,Gloria Ezzat

doi:10.1080/1206212x.2020.1855705

Abstract

Large scale software projects adopt bug tracking systems such as Bugzilla and Jira to manage the bugs’ fixes and store their information. Mining bug repositories is essential to automate some maintenance phase activities like bug triage. Recently, embedding and deep learning models have emerged as a breakthrough for text classification applications. This paper proposes utilizing word embedding and deep learning models for mining bug repositories, in order to identify severity classes of newly reported bugs. Embedding models were utilized to represent bug reports, due to their ability to acquire semantic relations among words and sentences. Five effective deep learning models (CNN, LSTM, GRU, hybrid CNN-LSTM/GRU) were trained to classify the bug reports to their severity classes. We attempted these models as each of them has different capabilities; CNN can extract key features of an input and reduce the feature space size. While, LSTM and GRU can represent sequential words (sentences) effectively. Experimental results on two open large-scale bug repositories, Eclipse and Mozilla, demonstrated that CNN model is the superior. However, it was found that the micro-average classification performance of the CNN could be achieved by some traditional classifiers, e.g. SVM and KNN, trained using the embedding vectors instead of the bag-of-words-model.

Full Text