Abstract
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.
Highlights
We first retrieved the medical subject headings (MeSHs) terms associated with PubMed abstracts and used them to retain only those abstracts focused on diseases. 5,136,645 of 22,376,811 articles listed in PubMed had an abstract and could be assigned to such a MeSH disease term
We filtered this initial set of Human Phenotype Ontology (HPO) terms, by using a ranking-and-clustering method with the aim of maximizing the F-score computed on a manually curated gold-standard set of 41 common diseases
Translational research in Mendelian diseases has benefited enormously from databases of the phenotypic features associated with individual diseases, such as OMIM,[65] Orphanet,[66] and more recently the HPO.[1,2]
Summary
The Human Phenotype Ontology (HPO) provides a structured, comprehensive, and well-defined set of over 11,000 classes (terms) that describe phenotypic abnormalities seen in human disease.[1,2] The HPO has been used for developing algorithms and computational tools for clinical differential diagnostics,[3,4,5] for the prioritization of candidate disease-associated genes,[6,7,8,9,10,11] in exome sequencing studies,[6,7,8,9,10] and for diagnostics in clinical exome sequencing.[11] In addition, the HPO has been used for translational research, including inferring novel drug indications,[12] characterizing the proteome of the human postsynaptic density,[13] analyzing Neandertal exomes,[14] and other topics.[15,16,17,18,19,20,21,22]. The HPO project provides a standard phenotype terminology and a collection of disease-phenotype annotations, i.e., computational assertions that a disease is associated with a given phenotypic abnormality. These definitions are useful for a number of applications, including cross-species phenotype comparisons[6,28,29] and computational quality control.[30]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.