Semantic Text Summarization Based on Syntactic Patterns

Mohamed H Haggag

doi:10.4018/ijirr.2013100102

Abstract

Text summarization is machine based generation of a shortened version of a text. The summary should be a non-redundant extract from the original text. Most researches of text summarization use sentence extraction instead of abstraction to produce a summary. Extraction is depending mainly on sentences that already contained in the original input, which makes it more accurate and more concise. When all input articles are surrounding a particular event, extracting similar sentences would result in producing a highly repetitive summary. In this paper, a novel model for text summarization is proposed based on removing the non-effective sentences in producing an extract from the text. The model utilizes semantic analysis by evaluating sentences similarity. This similarity is provided by evaluating individual words similarity as well as syntactic relationships between neighboring words. These relationships addressed throughout the model as syntactic patterns. Word senses and the correlating part of speech for the word within context are provided in the semantic processing of matched patterns. The introduction of syntactic patterns knowledge supports text reduction by mapping the matched patterns into summarized ones. In addition, syntactic patterns make use of sentence relatedness evaluation in defining which sentences to keep and which to drop. Experiments proved that the model presented throughout the paper is well performing in results evaluation of compression rate, accuracy, recall and other human criteria like correctness, novelty, fluency and usefulness.

Full Text