Abstract

Recent studies have shown evidence for the coevolution of functionally-related genes. This coevolution is a result of constraints to maintain functional relationships between interacting proteins. The studies have focused on the correlation in gene tree branch lengths of proteins that are directly interacting with each other. We here hypothesize that the correlation in branch lengths is not limited only to proteins that directly interact, but also to proteins that operate within the same pathway. Using generalized linear models as a basis of identifying correlation, we attempted to predict the gene ontology (GO) terms of a gene based on its gene tree branch lengths. We applied our method to a dataset consisting of proteins from ten prokaryotic species. We found that the degree of accuracy to which we could predict the function of the proteins from their gene tree varied substantially with different GO terms. In particular, our model could accurately predict genes involved in translation and certain ribosomal activities with the area of the receiver-operator curve of up to 92%. Further analysis showed that the similarity between the trees of genes labeled with similar GO terms was not limited to genes that physically interacted, but also extended to genes functioning within the same pathway. We discuss the relevance of our findings as it relates to the use of phylogenetic methods in comparative genomics.

Highlights

  • Estimating lineage-specific substitution rates and divergence dates has become an increasingly important aspect of the reconstruction of evolutionary history [1,2,3,4]

  • For every possible combination of gene ontology (GO) biological process and molecular function, we found the number of genes that were involved in both GO terms

  • The predictions from the Generalized Linear Models (GLM) were converted to estimates of whether the gene is involved in a process for a range of cut-off values

Read more

Summary

Introduction

Estimating lineage-specific substitution rates and divergence dates has become an increasingly important aspect of the reconstruction of evolutionary history [1,2,3,4]. Differences in substitution rates from lineage to lineage have been attributed to variation in neutral rates of substitution, population size, generation times, and selective forces These together are responsible for the non-ultrametric distances on a tree [5,6] and gives rise to lineage-specific variation in molecular evolutionary rates. The molecular evidence for such specific selection-mediated substitutions has been the subject of much research since the pioneering paper of Messier and Stewart [9,10,11,12,13,14] These selection-mediated substitutions are by definition non-neutral and would not be expected to be consistent across genes or across lineages

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.