Abstract

BackgroundBiomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach.ResultsThe string-based approach achieved an error rate of 0.143, with a mean reciprocal rank of 0.907. The context-based and string-based approaches were found to be complementary, and the error rate was reduced further by applying a linear combination of the two classifiers. The advantage of combining the two approaches was especially manifested on test data with sufficient contextual features, achieving the lowest error rate of 0.055 and a mean reciprocal rank of 0.969.ConclusionThe lexical features provide another semantic dimension in addition to syntactic contextual features that support the classification of ontological concepts. The classification errors of each dimension can be further reduced through appropriate combination of the complementary classifiers.

Highlights

  • Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems

  • Error rates of individual approaches The error rates computed by formula 4 are presented in Table 1, where N represents the number of Concept Unique Identifier (CUI) tested

  • The advantage of the string-based approach over the distributional approach was stronger on test sets where the latter obtained fewer features

Read more

Summary

Introduction

Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. The ontologies specify the definitions of biomedical terms and associate them with normalized concepts and semantic categories within the ontological structures They provide abundant lexical and semantic knowledge that is especially valuable to Natural Language Processing (NLP) systems. Spasic et al [9] summarized various approaches that applied ontologies in biomedical NLP tasks

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.