Abstract

We examined a broad selection of protein-coding loci from a diverse array of clades and genomes to quantify three factors that determine whether nucleotide or amino acid characters should be preferred for phylogenetic inference. First, we quantified the difference in observed character-state space between nucleotides and amino acids. Second, we quantified the loss of potential phylogenetic signal from silent substitutions when amino acids are used. Third, we used the disparity index to quantify the relative compositional heterogeneity of nucleotides and amino acids and then determined how commonly convergent (rather than unique) shifts in nucleotide and amino acid composition occur in a phylogenetic context. The greater potential phylogenetic signal for nucleotide characters was found to be enormous (on average 440% that of amino acids), whereas the greater observed character-state space for amino acids was less impressive (on average 150.4% that of nucleotides). While matrices of amino acid sequences had less compositional heterogeneity than their corresponding nucleotide sequences, heterogeneity in amino acid composition may be more homoplasious than heterogeneity in nucleotide composition. Given the ability of increased taxon sampling to better utilize the greater potential phylogenetic signal of nucleotide characters and decrease the potential for artifacts caused by heterogeneous nucleotide composition among taxa, we suggest that increased taxon sampling be performed whenever possible instead of restricting analyses to amino acid characters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call