Sentence Annotation based Enhanced Semantic Summary Generation from Multiple Documents

Abdullah Abdullah

doi:10.3844/ajassp.2012.1063.1070

Abstract

Problem statement: The goal of document summarization is to provide a summary or outline of manifold documents with reduction in time. Sentence extraction could be a technique that is employed to pick out relevant and vital sentences from documents and presented as a summary. So there is a need to develop more meaningful sentence selection strategy so as to extract most significant sentences. Approach: This study proposes an approach of generating initial and update summary by performing sentence level semantic analysis. In order to select the necessary information from documents all the sentences are annotated with aspects, prepositions and named entities. To detect most dominant concepts within a document, Wikipedia is used as a resource and the weight of each word is calculated using Term Synonym Concept Frequency-Inverse Sentence Frequency (TSCF-ISF) measure. Sentences are ranked based on the scores they have been assigned and the summary is formed from the highest ranking sentences. Results: To evaluate the quality of a summary based on coverage between machine summary and human summary intrinsic measures called Precision and Recall are used. Precision is used to determine exactness whereas Recall is used to measure the completeness of the summary. Then our results are compared with LexRank Update summarization task and with the Semantic Summary Generation method. The ROUGE-1 measure is used to identify how well machine generated summary correlates with human summary. Conclusion: The performance of update summarization relies highly on measurement of sentence similarity based on TSCF-ISF. The experiment result shows that low overlap between initial summary and its update summary.

Highlights

Online web content data are raised in an increasing speed, people should develop a crisp overview from a large number of articles in a tiny point in time
In order to provide a lot of semantic information, guided summarization task is introduced by the Text Analysis Conference (TAC)
The proposed method can be split into the following modules: (1) summary generation algorithm (2) sentence annotation (3) Wikipedia based semantic element extraction (4) initial summary generation (5) update summary generation

Summary

INTRODUCTION

Online web content data are raised in an increasing speed, people should develop a crisp overview from a large number of articles in a tiny point in time. The aim of multi-document update summary generation is to construct a summary unfolding the mainstream of data from a collection of documents with the hypothesis that the user has already read a set of previous documents. This sort of summarization has been proved significantly helpful in tracing news stories, solely new data got to be summarized if we had previously known a little about the story. In order to provide a lot of semantic information, guided summarization task is introduced by the Text Analysis Conference (TAC). It aims to produce semantic summary by using a list of important aspects.

Background

AND DISCUSSION

CONCLUSION