Abstract

Complex noun phrases are pervasive in biomedical texts, but are largely underexplored in entity discovery and information extraction. Such expressions often contain a mix of highly specific names (diseases, drugs, etc.) and common words such as “condition”, “degree”, “process”, etc. These words can have different semantic types depending on their context in noun phrases. In this paper, we address the task of classifying these common words onto fine-grained semantic types: for instance, “condition” can be typed as “symptom and finding” or “configuration and setting”. For information extraction tasks, it is crucial to consider common nouns only when they really carry biomedical meaning; hence the classifier must also detect the negative case when nouns are merely used in a generic, uninformative sense. Our solution harnesses a small number of labeled seeds and employs label propagation, a semisupervised learning method on graphs. Experiments on 50 frequent nouns show that our method computes semantic labels with a microaveraged accuracy of 91.34%.

Highlights

  • 1.1 MotivationIn biomedical texts, entities are written as natural language expressions – often complex noun phrases

  • We focus on a judiciously chosen list of common nouns, referred to as target words, that frequently appear within long noun phrases in biomedical texts

  • We develop a semisupervised method for labeling a target word, within a given noun phrase, with its most suitable semantic type or tagging it as biomedically unspecific and uninformative

Read more

Summary

Introduction

Entities are written as natural language expressions – often complex noun phrases. Previous works on information extraction in this domain have focused on short phrases that work well, for instance, with dictionary-based approaches. Expressions are long and complex, mixing domain-specific names (of diseases, symptoms, drugs, etc.) with common nouns such as “condition”, “degree” or “process”. Examples for such complex phrases are: 1) monitoring of the carcinogenic process 2) development of processes for the prognosis of malaria. “process” is used in the generic sense of the common noun and is relatively uninformative for the purpose of detecting biomedical entities in text. In the first case, we would like to further annotate the common noun with a semantic type that captures the role of the word within the surrounding noun phrase

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.