Abstract

Functional annotation of genes remains a challenge in fundamental biology and is a limiting factor for translational medicine. Computational approaches have been developed to process heterogeneous data into meaningful metrics, but often do not address how findings might be updated when new evidence comes to light. To address this challenge, we describe requirements for a framework for incremental data integration and propose an implementation based on phenotype ontologies and Bayesian probability updates. We apply the framework to quantify similarities between gene annotations and disease profiles. Within this scope, we categorize human diseases according to how well they can be recapitulated by animal models and quantify similarities between human diseases and mouse models produced by the International Mouse Phenotyping Consortium. The flexibility of the approach allows us to incorporate negative phenotypic data to better prioritize candidate genes, and to stratify disease mapping using sex-dependent phenotypes. All our association scores can be updated and we exploit this feature to showcase integration with curated annotations from high-precision assays. Incremental integration is thus a suitable framework for tracking functional annotations and linking to complex human pathology.

Highlights

  • Technological advances ensure that a growing range of assays are available to probe biological systems, from profiling of single cells to phenotyping of entire organisms

  • We describe requirements for a framework for incremental data integration and propose an implementation based on phenotype ontologies and Bayesian probability updates

  • We categorize human diseases according to how well they can be recapitulated by animal models and quantify similarities between human diseases and mouse models produced by the International Mouse Phenotyping Consortium

Read more

Summary

Introduction

Technological advances ensure that a growing range of assays are available to probe biological systems, from profiling of single cells to phenotyping of entire organisms. These data open possibilities to study the role of genes in fundamental biological processes as well as disease. A complete understanding of gene function is only achieved by synthesizing several lines of evidence [1,2] Some aspects of this data-integration task are addressed by specialized approaches that, for example, perform analyses of multi-omic data [3], meta-analyze related cohorts [4], or summarize outputs of distinct computational approaches [5]. Methods are needed that are compatible with a wide range of experimental workflows and that can capture an evolving state of knowledge

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.