Abstract

Gene–phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene–phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene–phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene–phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene–phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene–phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene–phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene–disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene–disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene–phenotype associations which are not currently covered by the existing public gene–phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene–phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.

Highlights

  • Phenotypes are the observable characteristics of an organism resulting from its genotype and response to environment

  • We developed a method to mine gene–phenotype associations from the literature, using the knowledge contained in the phenotype ontologies as background knowledge

  • Central Nervous System (CNS) inflammation (MP:0006082) has the subclass Brain inflammation (MP:0001847) in Mammalian Phenotype Ontology (MP); Brain inflammation is further inferred to be equivalent to the class Encephalitis (HP:0002383) in the PhenomeNET ontology [7]

Read more

Summary

Introduction

Phenotypes are the observable characteristics of an organism resulting from its genotype and response to environment. Associations of genotypes and phenotypes shed light on our understanding of disease mechanisms as they provide a way of observing the indirect consequences of multi-scale physiological interactions occurring within an organism. The diversity of phenotypes makes it challenging to represent them in a way that is comparable within and across databases. In response to this challenge, phenotype ontologies have been developed that formally represent phenotypes in several species and enable their integration and comparison [4]. While the majority of phenotype ontologies was species-specific and limited to one – or a few related – species, there has been significant effort in integrating phenotype ontologies recently so that phenotypes across species can be compared and jointly analyzed [5,6,7]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.