Abstract

The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.

Highlights

  • Comparative biology has become a central strategy in the study of human biology and disease

  • We present a new tool called WORMHOLE that predicts a strict subclass of orthologs called least diverged orthologs (LDOs) with a high level of functional specificity by learning features of orthology that are encoded in the patterns of predictions made by 17 constituent methods

  • Each of the established ortholog prediction algorithms (Ensembl Compara, EggNOG, etc.) uses different combinations of these first-order features to generate predicted ortholog relationships, forming the first layer of prediction (Fig 1B). These algorithms generate a pool of candidate ortholog predictions, and candidate LDOs, that can be considered novel second-order features (Fig 1C)

Read more

Summary

Introduction

Comparative biology has become a central strategy in the study of human biology and disease. The availability of powerful genetic tools and our ability to control experimental conditions in model organisms often allows a much more detailed examination than directly studying a process of interest in humans. In diverse areas of biology—aging, development, stem cell differentiation, behavior—highly conserved molecular features have been described in model systems, even highly evolutionarily divergent organisms, and translated into useful interventions in humans. The ability to delay aging by inhibition of the Target of Rapamycin (TOR) kinase was first discovered in the single-celled budding yeast Saccharomyces cerevisiae, and much of the work to characterize TOR signalling has been carried out in this system (reviewed by Loewith and Hall [1]). To reap the practical benefits of invertebrate models in studying the genetics of human health, it is crucial to translate molecular results from invertebrates into vertebrates

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.