Abstract
Automatic text summarization aims at reducing the length of input documents while preserving the most important information. A key challenge in automatic summarization is therefore to estimate the importance of information. Most extractive summarization systems, however, usually only consider bigrams as the representation from which importance can be estimated. The potential of other text annotations such as frames or named-entities remains unexplored. In this paper, we evaluate the application potential of linguistic annotations for automatic text summarization. To this end, we extend a previously presented summarization system by replacing bigrams with a multitude of different linguistic annotation types, including n-grams, verb stems, frames, concepts, chunks, connotation frames, entity types, and discourse relation sense-types. We propose two novel evaluation methods to evaluate information importance detection capabilities. In our experiments, bigrams show the best overall performance when source document sentences have to be ranked. These results support the decision of summarization system developers to use bigrams in summarization systems. However, other annotation types perform better if the model has to distinguish between source and reference sentences.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.