Abstract

BackgroundMammalian genome sequence data are being acquired in large quantities and at enormous speeds. We now have a tremendous opportunity to better understand which genes are the most variable or conserved, and what their particular functions and evolutionary dynamics are, through comparative genomics.ResultsWe chose human and eleven other high-coverage mammalian genome data–as well as an avian genome as an outgroup–to analyze orthologous protein-coding genes using nonsynonymous (Ka) and synonymous (Ks) substitution rates. After evaluating eight commonly-used methods of Ka and Ks calculation, we observed that these methods yielded a nearly uniform result when estimating Ka, but not Ks (or Ka/Ks). When sorting genes based on Ka, we noticed that fast-evolving and slow-evolving genes often belonged to different functional classes, with respect to species-specificity and lineage-specificity. In particular, we identified two functional classes of genes in the acquired immune system. Fast-evolving genes coded for signal-transducing proteins, such as receptors, ligands, cytokines, and CDs (cluster of differentiation, mostly surface proteins), whereas the slow-evolving genes were for function-modulating proteins, such as kinases and adaptor proteins. In addition, among slow-evolving genes that had functions related to the central nervous system, neurodegenerative disease-related pathways were enriched significantly in most mammalian species. We also confirmed that gene expression was negatively correlated with evolution rate, i.e. slow-evolving genes were expressed at higher levels than fast-evolving genes. Our results indicated that the functional specializations of the three major mammalian clades were: sensory perception and oncogenesis in primates, reproduction and hormone regulation in large mammals, and immunity and angiotensin in rodents.ConclusionOur study suggests that Ka calculation, which is less biased compared to Ks and Ka/Ks, can be used as a parameter to sort genes by evolution rate and can also provide a way to categorize common protein functions and define their interaction networks, either pair-wise or in defined lineages or subgroups. Evaluating gene evolution based on Ka and Ks calculations can be done with large datasets, such as mammalian genomes.ReviewersThis article has been reviewed by Drs. Anamaria Necsulea (nominated by Nicolas Galtier), Subhajyoti De (nominated by Sarah Teichmann) and Claus O. Wilke.

Highlights

  • Mammalian genome sequence data are being acquired in large quantities and at enormous speeds

  • Following the publication of the complete human genome sequence [2], over a dozen mammalian genomes have been sequenced, allowing mammalian comparative genomics to come to age

  • We found that the number of Ns in all coding sequences (CDS) fell within reasonable ranges: (1) the number of Ns/the number of nucleotides = 0.00002740 ± 0.00059475; (2) the total number of orthologs containing Ns/total number of orthologs × 100% = 1.5084%

Read more

Summary

Introduction

Mammalian genome sequence data are being acquired in large quantities and at enormous speeds. One concerns gene gain-and-loss that is related to the amplification and deletion of certain genes and their chromosomal regions This is an important evolutionary mechanism to shape mammalian genomes through natural selection, but it leads to gene family expansion and deletion, which has been proposed to be one molecular origin of chimp-human evolution [3]. Another form of genetic variation is sequence variation at specific nucleotide sites in proteincoding genes. Such variations become functionally relevant when they alter protein sequences

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.