Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

T Groza,G Baynam,D Smedley,F M Couto,A Zankl,S Kohler,A Oellrich,P N Robinson,N Collier,S Doelken

doi:10.1093/database/bav005

Abstract

Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from http://bio-lark.org/hpo_res.html.Database URL: http://bio-lark.org/hpo_res.html

Highlights

The Human Phenotype Ontology (HPO) [1] is widely used for the annotation of human phenotypes and has been employed in many biomedical applications aiming to understand the phenotypic consequences of genomic variation [2]
We present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes
In order to create a better overview of the concepts captured in the corpus, we have mapped them to the 21 top-level phenotype abnormalities defined by HPO

Summary

Introduction

The Human Phenotype Ontology (HPO) [1] is widely used for the annotation of human phenotypes and has been employed in many biomedical applications aiming to understand the phenotypic consequences of genomic variation [2]. ‘subtle flattening and squaring of the metacarpal heads’, ‘segmentation defects appear to affect L4-S1’; (v) complex intrinsic structure—the lexical structure of phenotype descriptions may take several forms. They may have a canonical form, i.e. a conjunction of well-defined quality-entity pairs, where entities represent, e.g. an anatomical structure in focus (e.g. thorax) and qualities denote certain characteristics of the entities (e.g. bell-shaped)—resulting in the phenotype ‘bell-shaped thorax’. In particular the latter three, makes the identification of the boundaries of phenotype descriptions difficult

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Feb 27, 2015
Citations: 62	License type: cc-by

R Discovery Prime

R Discovery Prime

Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

PhenoRerank: A re-ranking model for phenotypic concept recognition pre-trained on human phenotype ontology
Shankai Yan ... Zhiyong Lu
Journal of Biomedical Informatics | VOL. 129
Shankai Yan, et. al.Shankai Yan ... Zhiyong Lu
26 Mar 2022
Journal of Biomedical Informatics | VOL. 129

Finding Gene Names
Soumya Raychaudhuri
-
Soumya RaychaudhuriSoumya Raychaudhuri
26 Jan 2006
26 Jan 2006

FastHPOCR: pragmatic, fast, and accurate concept recognition using the human phenotype ontology.
Tudor Groza ... Gareth Baynam
Bioinformatics (Oxford, England) | VOL. 40
Tudor Groza, et. al.Tudor Groza ... Gareth Baynam
24 Jun 2024
Bioinformatics (Oxford, England) | VOL. 40

A new synonym-substitution method to enrich the human phenotype ontology
Maria Taboada ... Ranga C Gudivada
BMC Bioinformatics | VOL. 18
Maria Taboada, et. al.Maria Taboada ... Ranga C Gudivada
10 Oct 2017
BMC Bioinformatics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database