Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization

Laura Plaza,Jorge Carrillo-de-Albornoz

doi:10.1186/1471-2105-14-71

Abstract

BackgroundThe position of a sentence in a document has been traditionally considered an indicator of the relevance of the sentence, and therefore it is frequently used by automatic summarization systems as an attribute for sentence selection. Sentences close to the beginning of the document are supposed to deal with the main topic and thus are selected for the summary. This criterion has shown to be very effective when summarizing some types of documents, such as news items. However, this property is not likely to be found in other types of documents, such as scientific articles, where other positional criteria may be preferred. The purpose of the present work is to study the utility of different positional strategies for biomedical literature summarization.ResultsWe have evaluated three different positional strategies: (1) awarding the sentences at the beginning of the document, (2) preferring those at the beginning and end of the document, and (3) weighting the sentences according to the section in which they appear. To this end, we have implemented two summarizers, one based on semantic graphs and the other based on concept frequencies, and evaluated the summaries they produce when combined with each of the positional strategies above using ROUGE metrics. Our results indicate that it is possible to improve the quality of the summaries by weighting the sentences according to the section in which they appear (≈17% improvement in ROUGE-2 for the graph-based summarizer and ≈20% for the frequency-based summarizer), and that the sections containing the more salient information are the Methods and Material and the Discussion and Results ones.ConclusionsIt has been found that the use of traditional positional criteria that award sentences at the beginning and/or the end of the document are not helpful when summarizing scientific literature. In contrast, a more appropriate strategy is that which weights sentences according to the section in which they appear.

Highlights

Introduction and MotivationThe amount of biomedical literature being published is growing rapidly in recent years, making it difficult for researchers to find the information they need
In the present work we study if the use of different positional criteria may be of help when summarizing scientific biomedical articles
A pioneer work in biomedical summarization is found in [25]. They propose the use of semantic predications provided by SemRep [26] and information from the Unified Medical Language System (UMLS) [27] to extract biomedical entities and relations, and generate semanticlevel abstracts, which are presented in graphical format

Summary

Introduction

Introduction and MotivationThe amount of biomedical literature being published is growing rapidly in recent years, making it difficult for researchers to find the information they need. Biomedical summarization works typically adapt existing methods from domain-independent summarization to deal with the highly specialized biomedical terminology To this end, they make use of external knowledge sources to represent the texts as sets of domain concepts and relations. A pioneer work in biomedical summarization is found in [25] They propose the use of semantic predications provided by SemRep [26] and information from the Unified Medical Language System (UMLS) [27] to extract biomedical entities and relations, and generate semanticlevel abstracts, which are presented in graphical format. Reeve et al [1] use the frequency of the UMLS Metathesaurus concepts found in the text and adapt the lexical chaining approach [29] to deal with concepts instead of terms Their system is used to produce single-document extracts of biomedical articles

Objectives

Methods

Results

Discussion

Conclusion