Abstract

BackgroundBoth intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research. However, most existing methods either focus on extracting intra-sentential relations and ignore inter-sentential ones or fail to extract inter-sentential relations accurately and regard the instances containing entity relations as being independent, which neglects the interactions between relations. We propose a novel sequence labeling-based biomedical relation extraction method named Bio-Seq. In the method, sequence labeling framework is extended by multiple specified feature extractors so as to facilitate the feature extractions at different levels, especially at the inter-sentential level. Besides, the sequence labeling framework enables Bio-Seq to take advantage of the interactions between relations, and thus, further improves the precision of document-level relation extraction.ResultsOur proposed method obtained an F1-score of 63.5% on BioCreative V chemical disease relation corpus, and an F1-score of 54.4% on inter-sentential relations, which was 10.5% better than the document-level classification baseline. Also, our method achieved an F1-score of 85.1% on n2c2-ADE sub-dataset.ConclusionSequence labeling method can be successfully used to extract document-level relations, especially for boosting the performance on inter-sentential relation extraction. Our work can facilitate the research on document-level biomedical text mining.

Highlights

  • Both intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research

  • Task description Each document in the chemical disease relation (CDR) corpus consists of a title and an abstract. It has been manually annotated with chemical, disease mentions associated with their Medical Subject Headings concept identifiers (MeSH® IDs) [4] and their document-level relations

  • Datasets and experimental settings The Bio-Seq method is evaluated on two datasets: the CDR and n2c2-Adverse drug event (ADE) corpora, which model relations between chemicals and diseases at the document level in biomedical literature and between drugs and ADEs at the mention level in clinical notes, respectively

Read more

Summary

Introduction

Both intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research. In this chemical disease relation (CDR) corpus, different from traditional sentence-level relation classification tasks (e.g. Semeval-2010 Task 8 [5]), the CID relations are annotated only at the document level (i.e. without giving the specific sentence that conveys a relation). According to the document-level annotation, it is hard to tell which sentence(s) convey(s) the meaning of a specific relation, since an entity can be mentioned multiple times in different sentences in an abstract and the offsets of related entities, which can be used to identify the unique mention of an entity in an abstract, are not given. The inter-sentential relations account for approximately 1/3 of all relations, signifying that traditional sentence-level relation extraction methods may not be appropriate to get satisfactory results

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.