Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.

Noha Alnazzawi,Sophia Ananiadou,Paul Thompson

doi:10.1371/journal.pone.0162287

Abstract

Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus—a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm’s wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.

Highlights

IntroductionHuman phenotypic information constitutes the observable traits of human beings (e.g., height, eye colour, etc.) resulting from genetic make-up and environmental influences
Human phenotypic information constitutes the observable traits of human beings resulting from genetic make-up and environmental influences
We have evaluated the performance of PhenoNorm in normalising phenotype mentions in PhenoCHF, and we show that it achieves higher accuracy than other, more general normalisation methods when they are applied to the same task

Summary

Introduction

Human phenotypic information constitutes the observable traits of human beings (e.g., height, eye colour, etc.) resulting from genetic make-up and environmental influences. Narrative EHR information includes details about individual patient diagnoses, medication, family history, patient past history, signs, symptoms and findings, whilst scientific articles tend to summarise the latest research findings, results and advances in knowledge relevant to different diseases [2, 3]. Given that these different types of information can often be complementary to each other, important details may be overlooked if only a single source (or text type) is considered. As such, automated methods to combine relevant details from different text types can be extremely useful, to discover extended information about a given concept (e.g., to gather alternative perspectives regarding risk factors contributing to a given disease), and to uncover novel associations between diseases and phenotypes, which may be scattered amongst documents, both within a given text type and across different text types

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Sep 19, 2016
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization.
Dongfang Xu ... Steven Bethard
Journal of the American Medical Informatics Association : JAMIA | VOL. 27
Dongfang Xu, et. al.Dongfang Xu ... Steven Bethard
27 Jul 2020
Journal of the American Medical Informatics Association : JAMIA | VOL. 27

Toward Data-Driven Radiation Oncology Using Standardized Terminology as a Starting Point: Cross-sectional Study
Nikola Cihoric ... Eugenia Vlaskou Badra
JMIR Formative Research | VOL. 6
Nikola Cihoric, et. al.Nikola Cihoric ... Eugenia Vlaskou Badra
19 Jan 2022
JMIR Formative Research | VOL. 6

Using text mining techniques to extract phenotypic information from the PhenoCHF corpus
Noha Alnazzawi ... Paul Thompson
BMC Medical Informatics and Decision Making | VOL. 15
Noha Alnazzawi, et. al.Noha Alnazzawi ... Paul Thompson
15 Jun 2015
BMC Medical Informatics and Decision Making | VOL. 15

Medical Record Database Efficient but Troubling
Christine Lehmann
Psychiatric News | VOL. 38
Christine LehmannChristine Lehmann
01 Aug 2003
Psychiatric News | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one