Automatic Keyword and Sentence-Based Text Summarization for Software Bug Reports

Shubhra Goyal Jindal,Arvinder Kaur

doi:10.1109/access.2020.2985222

Abstract

Text Summarization is a process which efficiently retrieves the relevant information from documents. The objective of the proposed, unsupervised approach is to summarize bug reports (software artefacts) with complete content and diversified information. The proposed approach utilizes Rapid Automatic Keyword Extraction and term frequency-inverse document frequency method to extract meaningful keywords and key-phrases with a relevant score. For sentence extraction, fuzzy C-means clustering is used to extracts sentences having high degree of membership from each cluster above a set threshold value. A rule-engine is used for sentence selection. The rules are generated with the domain knowledge and based on the extracted information by the keywords and sentences selected by the clustering method. Cohesive and coherent summary is generated by the proposed method on apache bug reports. For redundancy removal and to re-rank generated summary, hierarchical clustering is presented to enrich the extracted summary. The proposed approach is evaluated on newly constructed Apache project Bug Report Corpus (APBRC) and existing Bug Report Corpus (BRC). The results are compared on the basis of performance metrics such as precision, recall, pyramid precision and F-score. The experimental results depict that our proposed approach attains significant improvement over other baseline approaches such as BRC and LRCA. It also attains significant improvement over existing state-of-art unsupervised approaches such as Hurried, centroid and others. It extracts significant keyword phrases and sentences from each cluster to achieve full coverage and coherent summary. The results evaluated on APBRC corpus attains an average value of 78.22%, 82.18%, 80.10% and 81.66% for precision, recall, f-score and pyramid precision respectively.

Highlights

In recent years, plenty of information is available on the internet from several domains
We focus on unsupervised approach and new method is constructed based on keyword-based features and sentence-based features to facilitate bug report summarization
This paper proposes an unsupervised approach to automatically summarize software bug reports based on keywords and sentence-based features

Summary

Introduction

Plenty of information is available on the internet from several domains. With huge amount of available data, it is an arduous and time-consuming task to read entire text documents and retrieve relevant information. To automatically attain relevant information in brief, text summarization is used. Generating accurate summary of a text document is a complex task and requires human intelligence to extract meaningful information from the text. Automatic text summarization has been used in several domains such as document summary [1], [2], essay or news summary [3], [4] and e-mail summarization [5], [6]. To oversee diverse number of bug reports, several automation tasks have been conducted such as detection of duplicate bug

Objectives

Methods

Results

Conclusion