Abstract

We argue that in general, the analysis of lexical cohesion factors in a document can drive a summarizer, as well as enable other content characterization tasks. More narrowly, this paper focuses on how one particular cohesion factor--simple lexical repetition---can enhance an existing sentence extraction summarizer, by enabling strategies for overcoming some particularly jarring end-user effects in the summaries, typically due to coherence degradation, readability deterioration, and topical under-representation. Lexical repetition is instrumental to, among other things, the topical make-up of a text, and in our framework a lexical repetition-based model of discourse segmentation, capable of detecting topic shifts, is integrated with a linguistically-aware summarizer utilizing notions of salience and dynamically-adjustable summary size. We show that even by leveraging lexical repetition alone, summaries are of comparable, and under certain conditions better, quality than the ones delivered by a state-of-the-art summarizer. This is encouraging for a broad research platform focusing on the recognition and use of cohesive devices in text for a range of content characterisation and document management tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.