Concept-Based Classification of Software Defect Reports

Sangameshwar Patil

doi:10.1109/msr.2017.20

Abstract

Automatic identification of the defect type from the textual description of a software defect can significantly speed-up as well as improve the software defect management life-cycle. This has been recognized in the research community and multiple solutions based on supervised learning approach have been proposed in the recent literature. However, these approaches need significant amount of labeled training data for use in real-life projects. In this paper, we propose to use Explicit Semantic Analysis (ESA) to carry out concept-based classification of software defect reports. We compute the semantic between the defect type labels and the defect report in a concept space spanned by Wikipedia articles and then, assign the defect type which has the highest similarity with the defect report. This approach helps us to circumvent the problem of dependence on labeled training data. Experimental results show that using concept-based classification is a promising approach for software defect classification to avoid the expensive process of creating labeled training data and yet get accuracy comparable to the traditional supervised learning approaches. To the best of our knowledge, this is the first use of Wikipedia and ESA for software defect classification problem.

Full Text