Abstract

AbstractThis article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large‐scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting‐only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.