Abstract

In this work, sentence similarity between sentences of software bug report is computed. For this, two methods are utilized, Latent Semantic Analysis and Text Rank. Latent Semantic Analysis is used to compute semantic similarity between sentences of bug reports which infers deeper and hidden relation between words. From this, a pair of sentences with semantic similarity above a set threshold is selected and from one pair of sentences, only one sentence is selected. The remaining sentences are passed into TextRank algorithm and sentences with high similarity are further selected to generate a coherent summary. The proposed approach is evaluated on a newly constructed Apache Project Bug Report Corpus and existing Bug Report Corpus. The proposed approach is also compared with baseline approaches that mainly focus on only lexical similarity. The results when evaluated on Apache project Bug Report Corpus attains an average value of 80%, 72.57%, 76.05% and 76.57% in terms of precision, recall, F-score and pyramid precision respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call