Improved characterisation of clinical text through ontology-based vocabulary expansion

Luke T Slater,Georgios V Gkoutos,William Bradlow,Robert Hoehndorf,Simon Ball

doi:10.1186/s13326-021-00241-5

Abstract

BackgroundBiomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.ResultsWe develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.ConclusionsInter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.

Highlights

Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining
Open Biomedical Ontologies (OBO) [3] and the Information Artifact Ontology (IAO) [4] define a series of conventional annotation properties that can be used for the expression of labels and synonyms
The synonym expansion algorithm is available as part of the Komenti text mining framework, which is available under an open source licence at https://github. com/reality/komenti, while the files used for validation are available at https://github.com/reality/synonym_expansion_validation

Summary

Introduction

Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. Redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies While these concepts describe the same entities, they contain different sets of complementary metadata. Open Biomedical Ontologies (OBO) [3] and the Information Artifact Ontology (IAO) [4] define a series of conventional annotation properties that can be used for the expression of labels and synonyms. These features are widely used: an investigation of ontologies in BioPortal found that 90% of classes had a label associated with them [5]. The labels associated with ontology terms constitute a controlled domain vocabulary [1]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Semantics	Publication Date: Apr 12, 2021
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Improved characterisation of clinical text through ontology-based vocabulary expansion

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Semantics

Lead the way for us

Similar Papers

Ontology-based design information extraction and retrieval
ZHANJUN LI ... KARTHIK RAMANI
Artificial Intelligence for Engineering Design, Analysis and Manufacturing | VOL. 21
ZHANJUN LI, et. al.ZHANJUN LI ... KARTHIK RAMANI
19 Mar 2007
Artificial Intelligence for Engineering Design, Analysis and Manufacturing | VOL. 21

Faculty Opinions recommendation of Textpresso: an ontology-based information retrieval and extraction system for biological literature.
Jonathan A Eisen
-
Jonathan A EisenJonathan A Eisen
14 Dec 2004
14 Dec 2004

Ontology-based information retrieval and extraction
Chen-Yu Lee ... Von-Wun Soo
-
Chen-Yu Lee, et. al. Chen-Yu Lee ... Von-Wun Soo
27 Jun 2005
27 Jun 2005

Textpresso: an ontology-based information retrieval and extraction system for biological literature.
Hans-Michael Müller ... Eimear E Kenny
PLoS Biology | VOL. 2
Hans-Michael Müller, et. al.Hans-Michael Müller ... Eimear E Kenny
21 Sep 2004
PLoS Biology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved characterisation of clinical text through ontology-based vocabulary expansion

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Semantics