Duplication Detection for Software Bug Reports Based on BM25 Term Weighting

Cheng-Zen Yang,Ing-Xiang Chen,Sin-Sian Wu,Hung-Hsueh Du

doi:10.1109/taai.2012.20

Abstract

Handling bug reports is an important issue in software maintenance. Recently, detection on duplicate bug reports has received much attention. There are two main reasons. First, duplicate bug reports may waste human resource to process these redundant reports. Second, duplicate bug reports may provide abundant information for further software maintenance. In the past studies, many schemes have been proposed using the information retrieval and natural language processing techniques. In this thesis, we propose a novel detection scheme based on a BM25 term weighting scheme. We have conducted empirical experiments on three open source projects, Apache, ArgoUML, and SVN. The experimental results show that the BM25-based scheme can effectively improve the detection performance in nearly all cases.

Full Text