Developing bug severity prediction models using word2vec

Rashmi Agrawal,Rinkaj Goyal

doi:10.1016/j.ijcce.2021.08.001

Abstract

Bug tracking systems use repositories to keep track of the bugs to improve software quality. A manual analysis of each bug and classifying it according to its severity is an unmanageable job. Therefore, it is imperative to correctly classify the severity of a bug, which otherwise might be misclassified by a laymen user. Text mining techniques have the potential to analyze such massive databases of the textual description of bug reports to classify bug severity levels adequately. Word embedding is a state-of-art text mining technique that captures the semantics of the text and group the words according to their relevance in a document. Words are embedded into real-valued vectors through word-embedding models. Word2vec is a word embedding model, proven to be effective in representing word meanings. However, the configuration of hyperparameters and feature selection affects the performance of word2vec. Tuning hyperparameter is a time-consuming process, and it is crucial to identify the correct set of parameters. The paper outlines the effectiveness of word2vec technique on the efficiency of classifiers to predict a bug severity from a bug report and examines the impact of different averaging methods, including the configuration of word embedding parameters on the classifiers’ efficiency. Results show that the bigger window size enhances the performance of classifiers; however, the influence of the minimum word count parameter was found to be mixed and depends on the selected data sets. Further, out of the classifiers used, Random Forest and Xgboost could classify the severity level for classes with few records or a rare occurrence of words specific to each class. Otherwise, Support Vector Machine and Naive Bayes classifiers performed better and worst, respectively.

Full Text