Due to the unavoidable bugs appearing in the most of the software systems, bug resolution has become one of the most important activities in software maintenance. To decrease the time cost in manual work, text classification techniques are applied to automatically identify severity of bug reports. In this paper, we address the problem of low-quality and class imbalance for identifying the severity of bug reports. First, we combine feature selection with instance selection to simultaneously reduce the bug report dimension and the word dimension, which could get small-scale and high-quality reduced data set. Then, an improve random oversampling technique, named, RSMOTE, which is presented to weaken the imbalancedness degree of class distribution. Finally, to avoid the random over-sampling uncertainty of RSMOTE, we develop an ensemble learning algorithm, which is based on Choquet fuzzy integral, to combine multiple RSMOTE. We empirically investigate the performance of data reduction on ten data sets of three large open source projects, namely, Eclipse, Mozilla, and GNOME. The results show that our approach can effectively reduce the data scale and improve the performance of identifying the severity of bug reports.
Read full abstract