An Analysis of Software Bug Reports Using Machine Learning Techniques

Ha Manh Tran,Sinh Van Nguyen,Phong Thanh Ho,Son Thanh Le

doi:10.1007/s42979-019-0004-1

Abstract

Bug tracking systems manage bug reports for assuring the quality of software products. A bug report (alsoreferred as trouble, problem, ticket or defect) contains several features for problem management and resolution purposes. Severity and priority are two essential features of a bug report that define the effect level and fixing order of the bug. Determining these features is challenging and depends heavily on human being, e.g., software developers or system operators, especially for assessing a large number of error and warning events occurring on software products or network services. This study first proposes a comparison of machine learning techniques for assessing severity and priority for software bug reports and then chooses an approach of using optimal decision trees, or random forest, for further investigation. This approach aims at constructing multiple decision trees based on the subsets of the existing bug dataset and features, and then selecting the best decision trees to assess the severity and priority of new bugs. The approach can be applied for detecting and forecasting faults in large, complex communication networks and distributed systems today. We have presented the applicability of random forest for bug report analysis and performed several experiments on software bug datasets obtained from open source bug tracking systems. Random forest yields an average accuracy score of 0.75 that can be sufficient for assisting system operators in determining these features. We have provided some analysis of the experimental results.

Full Text