Abstract

BackgroundA variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.ResultsUsing an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test.ConclusionsOur work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.

Highlights

  • A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences

  • We have focused on the extraction of phenotype definitions from scientific articles and their acquisition as Semantic Web Rule Language (SWRL) [5] rule statements in pre-existing Web Ontology Language (OWL) [6] ontologies

  • In this work formalized phenotype definitions are equivalent to defining phenotype rules in the domain knowledge base, and we investigated the relevance of our resultsseparately in all possible scenarios

Read more

Summary

Introduction

A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. A secondary problem is to identify the portion of text within a retrieved We have addressed this problem in our efforts to assist the information extraction needs of mental health experts who are developing a knowledge-based catalog of autism phenotypes [3]. Such phenotype concepts are represented as classes within a domain ontology and defined more precisely as rules expressing numeric or temporal cut-offs of measurements on standardized diagnostic tests [3]. Assisting clinical and genetics researchers acquire and maintain such rule-based classifications of phenotypes can facilitate their cataloging, comparison, and validation and enable the use of standardized biomedical definitions for robust, reproducible phenotype-genotype analyses

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.