Abstract

Identifying molecular connections between developmental processes and disease can lead to new hypotheses about health risks at all stages of life. Here we introduce a new approach to identifying significant connections between gene sets and disease genes, and apply it to several gene sets related to human development. To overcome the limits of incomplete and imperfect information linking genes to disease, we pool genes within disease subtrees in the MeSH taxonomy, and we demonstrate that such pooling improves the power and accuracy of our approach. Significance is assessed through permutation. We created a web-based visualization tool to facilitate multi-scale exploration of this large collection of significant connections (http://gda.cs.tufts.edu/development). High-level analysis of the results reveals expected connections between tissue-specific developmental processes and diseases linked to those tissues, and widespread connections to developmental disorders and cancers. Yet interesting new hypotheses may be derived from examining the unexpected connections. We highlight and discuss the implications of three such connections, linking dementia with bone development, polycystic ovary syndrome with cardiovascular development, and retinopathy of prematurity with lung development. Our results provide additional evidence that plays a key role in the early pathogenesis of polycystic ovary syndrome. Our evidence also suggests that the VEGF pathway and downstream NFKB signaling may explain the complex relationship between bronchopulmonary dysplasia and retinopathy of prematurity, and may form a bridge between two currently-competing hypotheses about the molecular origins of bronchopulmonary dysplasia. Further data exploration and similar queries about other gene sets may generate a variety of new information about the molecular relationships between additional diseases.

Highlights

  • The study of the health implications of developmental processes has entered the genomic era

  • We created a novel approach and tool to assess the overrepresentation of various developmental gene sets among groups of genes linked to specific diseases

  • Gene sets derived from the Gene Ontology that include the DFLAT annotation have been shown to improve the interpretability of gene expression data related to human development [18], so they are a reasonable choice for the analysis described here

Read more

Summary

Introduction

The study of the health implications of developmental processes has entered the genomic era. We hypothesized that by examining the relationships between sets of genes related to specific developmental processes and reported disease genes, we could develop novel insights into developmental impacts on health. We note that a similar principle - that of pooling many weak signals to provide a stronger one - has led to the creation of many highly effective ‘‘gene-set analysis’’ methods for expression data [7,8] and genome wide association data [9]. These approaches are inappropriate for assessing the overlap of

Author Summary
Conclusions and future work
Findings
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.