Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction

Bancha Luaphol,Jantima Polpinij,Manasawee Kaenampornpan

doi:10.34028/iajit/19/6/9

Abstract

In general, most existing bug report studies focus only on solving a single specific issue. Considering of multiple issues at one is required for a more complete and comprehensive process of bug fixing. We took up this challenge and proposed a method to analyze two issues of bug reports based on text mining techniques. Firstly, dependent bug reports are assembled into an individual cluster and then the bug reports in each cluster are analyzed for their severity. The method of dependent bug report assembly is experimented with threshold-based similarity analysis. Cosine similarity and BM25 are compared with term frequency (tf) weighting to obtain the most appropriate method. Meanwhile, four classification algorithms namely Random Forest (RF), Support Vector Machines (SVM) with the RBF kernel function, Multinomial Naïve Bayes (MNB), and k-Nearest Neighbor (k-NN) are utilized to model the bug severity predictor with four term weighting schemes, i.e., tf, term frequency-inverse document frequency (tf-idf), term frequency-inverse class frequency (tf-icf), and term frequency-inverse gravity moment (tf-igm). After the experimentation process, BM25 was found to be the most appropriate for dependent bug report assemblage, while for severity prediction using tf-icf weighting on the RF method yielded the best performance value.

Full Text