Abstract

A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the “ortholog conjecture”). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.

Highlights

  • The potential for gene duplication to generate evolutionary novelty was first noted in 1918 by Calvin Bridges, and the idea quickly found many supporters [2,3,4]

  • A guiding principle in the assignment of function from one organism to another is that single-copy genes (‘‘orthologs’’) are statistically more likely to provide functional information than are multi-copy genes, whether in the same organism or different organisms

  • The experiments used to annotate these genes come from 12,204 unique published papers whose results are collected in the Gene Ontology (GO) database; in a later section we carry out an independent analysis using microarray data to measure functional similarity

Read more

Summary

Introduction

The potential for gene duplication to generate evolutionary novelty was first noted in 1918 by Calvin Bridges (cited in [1]), and the idea quickly found many supporters [2,3,4]. As the first protein-sequence data became available, Zuckerkandl and Pauling [7] made the distinction between ‘‘duplication-independent homology’’ and ‘‘duplication-dependent homology,’’ what we refer to as orthology and paralogy, respectively [8,9]. They recognized that the paralogous a-, b-, and chemoglobin chains present in all jawed vertebrates were less functionally similar to each other than were orthologous copies between closely related species, largely because they had been diverged for a very long period of time. Similar statements can be found in many papers (e.g. [11,12,13,14,15,16,17,18]), and—as pointed out by Studer and Robinson-Rechavi [19]—can even be found in the primer on phylogenetics at the National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/About/primer/phylo.html)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.