Abstract

Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.

Highlights

  • Of environmentally-based diseases require understanding the response of organisms to environmental influences such as chemicals, radiation, habitat or society

  • Ontologies such as the Human Phenotype Ontology (HPO)[5] have been created in an attempt to provide a comprehensive controlled vocabulary and knowledge base describing the manifestations of human diseases, and these ontologies have been applied to characterize diseases in the Online Mendelian Inheritance in Man (OMIM) and Orphanet databases[6,7]

  • Phenotype information related to model organisms is being described using ontologies such as the Mammalian Phenotype Ontology (MP)[10], and data annotated with these ontologies is being systematically collected and organized in model organism databases[11]

Read more

Summary

Introduction

Of environmentally-based diseases require understanding the response of organisms to environmental influences such as chemicals, radiation, habitat or society. The systematic coding of phenotypic and molecular information related to humans and other model species facilitates integrative approaches for identifying novel disease-related molecular information[7,12,13], prioritizing candidate genes for diseases based on comparing the similarity between animal model phenotypes and human disease phenotypes[14,15] as well as predicting novel drug-target interactions, drug targets and indications[16,17,18,19] Extension of these strategies and tools for the study of common and infectious diseases has been hampered by the lack of an infrastructure providing phenotypes associated with common and infectious diseases, and integrating this information with the large volumes of experimentally verified and manually curated data available from model organisms. We make our results freely available at http://aber-owl.net/aber-owl/diseasephenotypes/ and provide a visualisation environment for them at http://aber-owl.net/aber-owl/diseasephenotypes/network/

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call