Abstract

In theory, codon models that account for the dependence of nucleotide substitutions between codon positions as well as differences between synonymous and non-synonymous changes best describe the sequence evolution in protein coding genes. However, in practice we know little about the degree to which violations of the assumptions of codon model-based estimates occur, and how significant these artifacts may be. In nucleotide-based phylogenies from first and second codon positions in a concatenated plastid gene data set, two distantly related taxa—dinoflagellate and haptophyte plastids—were robustly grouped together. This artifactual grouping is attributed to the parallel heterogeneity in leucine (Leu) and serine (Ser) codon usages in the data set. Here, by using this data set, we demonstrated that codon-based phylogenetic estimations are seriously biased, robustly uniting the dinoflagellate and haptophyte plastids into a monophyletic clade, when the model assumption of homogeneity of codon composition was violated. Our results suggest that similar phylogenetic artifacts may occur via codon usage heterogeneity in any amino acids in codon model-based estimations. We advise that homogeneity in codon usage across taxa in a data set be confirmed before codon model-based phylogenetic estimation is attempted.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call