Abstract

Phylogenetic inference can be improved by the development and use of better models for inference given the data available, or by gathering more appropriate data given the potential inferences to be made. Numerous studies have demonstrated the crucial importance of selecting a best-fit model to conducting accurate phylogenetic inference given a data set, explicitly revealing how model choice affects the results of phylogenetic inferences. However, the importance of specifying a correct model of evolution for predictions of the best data to be gathered has never been examined. Here, we extend analyses of phylogenetic signal and noise that predict the potential to resolve nodes in a phylogeny to incorporate all time-reversible Markov models of nucleotide substitution. Extending previous results on the canonical four-taxon tree, our theory yields an analytical method that uses estimates of the rates of evolution and the model of molecular evolution to predict the distribution of signal, noise, and polytomy. We applied our methods to a study of 29 taxa of the yeast genus Candida and allied members to predict the power of five markers, COX2, ACT1, RPB1, RPB2, and D1/D2 LSU, to resolve a poorly supported backbone node corresponding to a clade of haploid Candida species, as well as nineteen other nodes that are reasonably short and at least moderately deep in the consensus tree. The use of simple, unrealistic models that did not take into account transition/transversion rate differences led to some discrepancies in predictions, but overall our results demonstrate that predictions of signal and noise in phylogenetics are fairly robust to model specification.

Highlights

  • Phylogenetic inferences can be improved either by improving the models applied to data, or by improving the quality of the data

  • The use of simple, unrealistic models that did not take into account transition/transversion rate differences led to some discrepancies in predictions, but overall our results demonstrate that predictions of signal and noise in phylogenetics are fairly robust to model specification

  • We have extended the Townsend et al (2012) phylogenetic signal and noise analysis by incorporating all time-reversible Markov models of nucleotide substitution into the prediction of the power of a data set for resolving a quartet-taxon phylogeny

Read more

Summary

Introduction

Phylogenetic inferences can be improved either by improving the models applied to data, or by improving the quality of the data. Enormous progress has been made in the development of realistic, powerful evolutionary models for phylogenetic inference, and studies have demonstrated that using correct evolutionary models on a same data set can be essential to making correct inferences (e.g., Sullivan and Swofford, 1997; Kelsey et al, 1999; Ripplinger and Sullivan, 2010). Inference can be based on the most useful or least misleading loci for resolving the phylogeny at hand. It remains to be explored the degree to which model selection impacts the determination of optimal loci for phylogenetic inference

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.