Recoding Amino Acids to a Reduced Alphabet may Increase or Decrease Phylogenetic Accuracy.

Peter G Foster,T Martin Embley,Cymon J Cox,Dominik Schrempf,Tom A Williams,Gergely J Szöllősi

doi:10.1093/sysbio/syac042

Abstract

Common molecular phylogenetic characteristics such as long branches and compositional heterogeneity can be problematic for phylogenetic reconstruction when using amino acid data. Recoding alignments to reduced alphabets before phylogenetic analysis has often been used both to explore and potentially decrease the effect of such problems. We tested the effectiveness of this strategy on topological accuracy using simulated data on four-taxon trees. We simulated alignments in phylogenetically challenging ways to test the phylogenetic accuracy of analyses using various recoding strategies together with commonly used homogeneous models. We tested three recoding methods based on amino acid exchangeability, and another recoding method based on lowering the compositional heterogeneity among alignment sequences as measured by the Chi-squared statistic. Our simulation results show that on trees with long branches where sequences approach saturation, accuracy was not greatly affected by exchangeability-based recodings, but Chi-squared-based recoding decreased accuracy. We then simulated sequences with different kinds of compositional heterogeneity over the tree. Recoding often increased accuracy on such alignments. Exchangeability-based recoding was rarely worse than not recoding, and often considerably better. Recoding based on lowering the Chi-squared value improved accuracy in some cases but not in others, suggesting that low compositional heterogeneity by itself is not sufficient to increase accuracy in the analysis of these alignments. We also simulated alignments using site-specific amino acid profiles, making sequences that had compositional heterogeneity over alignment sites. Exchangeability-based recoding coupled with site-homogeneous models had poor accuracy for these data sets but Chi-squared-based recoding on these alignments increased accuracy. We then simulated data sets that were compositionally both site- and tree-heterogeneous, like many real data sets. The effect on the accuracy of recoding such doubly problematic data sets varied widely, depending on the type of compositional tree heterogeneity and on the recoding scheme. Interestingly, analysis of unrecoded compositionally heterogeneous alignments with the NDCH or CAT models was generally more accurate than homogeneous analysis, whether recoded or not. Overall, our results suggest that making trees for recoded amino acid data sets can be useful, but they need to be interpreted cautiously as part of a more comprehensive analysis. The use of better-fitting models like NDCH and CAT, which directly account for the patterns in the data, may offer a more promising long-term solution for analyzing empirical data. [Compositional heterogeneity; models of evolution; phylogenetic methods; recoding amino acid data sets.].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Systematic Biology	Publication Date: Jun 17, 2022
Citations: 14	License type: cc-by

R Discovery Prime

R Discovery Prime

Recoding Amino Acids to a Reduced Alphabet may Increase or Decrease Phylogenetic Accuracy.

Abstract

Talk to us

Similar Papers

More From: Systematic Biology

Lead the way for us

Similar Papers

Compositional Heterogeneity and Phylogenomic Inference of Metazoan Relationships
M P Nesnidal ... I Bruchhaus
Molecular Biology and Evolution | VOL. 27
M P Nesnidal, et. al.M P Nesnidal ... I Bruchhaus
09 Apr 2010
Molecular Biology and Evolution | VOL. 27

Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias.
Yang Liu ... Bernard Goffinet
Systematic Biology | VOL. 63
Yang Liu, et. al.Yang Liu ... Bernard Goffinet
28 Jul 2014
Systematic Biology | VOL. 63

Decision letter: Support for a clade of Placozoa and Cnidaria in genes with minimal compositional bias
Antonis Rokas ... Diethard Tautz
-
Antonis Rokas, et. al.Antonis Rokas ... Diethard Tautz
05 May 2018
05 May 2018

The Interaction between Base Compositional Heterogeneity and Among-Site Rate Variation in Models of Molecular Evolution
Nathan C Sheffield
ISRN Evolutionary Biology | VOL. 2013
Nathan C SheffieldNathan C Sheffield
26 Dec 2012
ISRN Evolutionary Biology | VOL. 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recoding Amino Acids to a Reduced Alphabet may Increase or Decrease Phylogenetic Accuracy.

Abstract

Talk to us

Similar Papers

More From: Systematic Biology