Abstract

BackgroundNucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified?ResultsNucleotide mutations are assumed to occur independently of codon positions but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene with a linear function of a given estimate of selective constraints, which were estimated by maximizing the likelihood of an empirical amino acid or codon substitution frequency matrix, each of JTT, WAG, LG, and KHG. It is shown that the mechanistic codon substitution model with the assumption of equal codon usage yields better values of Akaike and Bayesian information criteria for all three phylogenetic trees of mitochondrial, chloroplast, and influenza-A hemagglutinin proteins than the empirical amino acid substitution models with mtREV, cpREV64, and FLU, which were designed specifically for those protein families, respectively. The variation of selective constraint across sites fits the datasets significantly better than variable codon mutation rates, confirming that substitution rate variations across sites detected by amino acid substitution models are caused primarily by the variation of selective constraint against amino acid substitutions rather than the variation of codon mutation rate.ConclusionsThe mechanistic codon substitution model is superior to amino acid substitution models even in the evolutionary analysis of protein sequences.

Highlights

  • Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family

  • On the basis of Akaike Information Criterion (AIC) [32] and Bayesian Information Criterion (BIC) [33], the amino acid substitution models with the empirical amino acid substitution rate matrices, cpREV64 [28], cpREV10 [8], mtREV [6], and FLU [29], as well as JTT [5], WAG [10], and LG [11] that were estimated from nuclear proteins, are compared with the mechanistic codon substitution models [26,27] with the selective constraint matrices estimated from JTT, WAG, LG, and KHG [25] by using the 3 datasets: fast-evolving interspecific mitochondrial proteins concatenating 12 proteincoding genes from 69 mammalian species [34], closelyrelated chloroplast-encoded proteins concatenating 52 protein-coding genes from 55 chloroplast genomes of the major angiosperm lineages [35], and HA proteins of Human influenza-A H1N1 (HA_Human-FluA-H1N1) consisting of 1309 sequences

  • The AIC and BIC values for these 3 datasets are listed in Tables 3, 4, and 5, respectively

Read more

Summary

Introduction

Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Empirical amino acid substitution rate matrices have been estimated from a large number of substitutions inferred on phylogenetic trees of single or many protein families; the JTT [5], the WAG [10], and the LG [11] matrices from nuclear proteins, mtREV [6] from vertebrate mitochondrial proteins, cpREV10 [8] and cpREV64 [28] from chloroplastencoded proteins, and FLU [29] from influenza proteins. A rate matrix such as mtREV, cpREV64, and FLU derived from a specific protein family represents substitution tendencies characteristic of the protein but often lacks generic representation of substitution tendencies enough to be applied to other protein families. We propose a different approach of employing a mechanistic codon substitution model in which the biological and evolutionary mechanisms of amino acid substitutions are taken into account

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call