Abstract

BackgroundPhylogenies are essential to many areas of biology, but phylogenetic methods may give incorrect estimates under some conditions. A potentially common scenario of this type is when few taxa are sampled and terminal branches for the sampled taxa are relatively long. However, the best solution in such cases (i.e., sampling more taxa versus more characters) has been highly controversial. A widespread assumption in this debate is that added taxa must be complete (no missing data) in order to save analyses from the negative impacts of limited taxon sampling. Here, we evaluate whether incomplete taxa can also rescue analyses under these conditions (empirically testing predictions from an earlier simulation study).Methodology/Principal FindingsWe utilize DNA sequence data from 16 vertebrate species with well-established phylogenetic relationships. In each replicate, we randomly sample 4 species, estimate their phylogeny (using Bayesian, likelihood, and parsimony methods), and then evaluate whether adding in the remaining 12 species (which have 50, 75, or 90% of their data replaced with missing data cells) can improve phylogenetic accuracy relative to analyzing the 4 complete taxa alone. We find that in those cases where sampling few taxa yields an incorrect estimate, adding taxa with 50% or 75% missing data can frequently (>75% of relevant replicates) rescue Bayesian and likelihood analyses, recovering accurate phylogenies for the original 4 taxa. Even taxa with 90% missing data can sometimes be beneficial.ConclusionsWe show that adding taxa that are highly incomplete can improve phylogenetic accuracy in cases where analyses are misled by limited taxon sampling. These surprising empirical results confirm those from simulations, and show that the benefits of adding taxa may be obtained with unexpectedly small amounts of data. These findings have important implications for the debate on sampling taxa versus characters, and for studies attempting to resolve difficult phylogenetic problems.

Highlights

  • Biologists are becoming increasingly aware that accurate estimates of phylogeny are critical to many areas of research, from genomics to community ecology to the identification and spread of emerging pathogens

  • We show that adding taxa that are highly incomplete can improve phylogenetic accuracy in cases where analyses are misled by limited taxon sampling

  • Our results suggest that the inaccurate estimates obtained with limited taxon sampling may be caused by long-branch attraction

Read more

Summary

Introduction

Biologists are becoming increasingly aware that accurate estimates of phylogeny are critical to many areas of research, from genomics to community ecology to the identification and spread of emerging pathogens. There are conditions where phylogenetic methods may give highly inaccurate estimates of phylogeny [1]. One such situation is when few taxa are sampled and branches among some of the sampled taxa are relatively long (i.e. many changes are expected or have occurred on these branches). The problem of inaccurate estimation when branches are long can potentially be resolved by either adding more taxa to an analysis or by adding more characters. Despite the potential relevance of taxon sampling versus character sampling to most phylogenetic studies (especially those of higher taxa), the issue remains unresolved. We evaluate whether incomplete taxa can rescue analyses under these conditions (empirically testing predictions from an earlier simulation study)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.