Abstract
BackgroundPhylogenies often contain both well-supported and poorly supported nodes. Determining how much additional data might be required to eventually recover most or all nodes with high support is an important pragmatic goal, and simulations have been used to examine this question. Most simulations have been based on few empirical loci, and suggest that well supported phylogenies can be determined with a very modest amount of data. Here we report the results of an empirical phylogenetic analysis of all 10 genera and 25 of 48 species of the new world pond turtles (family Emydidae) based on one mitochondrial (1070 base pairs) and seven nuclear loci (5961 base pairs), and a more biologically realistic simulation analysis incorporating variation among gene trees, aimed at determining how much more data might be necessary to recover weakly-supported nodes with strong support.ResultsOur mitochondrial-based phylogeny was well resolved, and congruent with some previous mitochondrial results. For example, all genera, and all species except Pseudemys concinna, P. peninsularis, and Terrapene carolina were monophyletic with strong support from at least one analytical method. The Emydinae was recovered as monophyletic, but the Deirochelyinae was not. Based on nuclear data, all genera were monophyletic with strong support except Trachemys, and all species except Graptemys pseudogeographica, P. concinna, T. carolina, and T. coahuila were monophyletic, generally with strong support. However, the branches subtending most genera were relatively short, and intergeneric relationships within subfamilies were mostly unsupported.Our simulations showed that relatively high bootstrap support values (i.e. ≥ 70) for all nodes were reached in all datasets, but an increase in data did not necessarily equate to an increase in support values. However, simulations based on a single empirical locus reached higher overall levels of support with less data than did the simulations that were based on all seven empirical nuclear loci, and symmetric tree distances were much lower for single versus multiple gene simulation analyses.ConclusionOur empirical results provide new insights into the phylogenetics of the Emydidae, but the short branches recovered deep in the tree also indicate the need for additional work on this clade to recover all intergeneric relationships with confidence and to delimit species for some problematic groups. Our simulation results suggest that moderate (in the few-to-tens of kb range) amounts of data are necessary to recover most emydid relationships with high support values. They also suggest that previous simulations that do not incorporate among-gene tree topological variance probably underestimate the amount of data needed to recover well supported phylogenies.
Highlights
Phylogenies often contain both well-supported and poorly supported nodes
We explicitly model the effects of variation in gene tree topology in our work, and explore both the overall gains in phylogenetic resolution, and the ability to recover particular problematic nodes with high support values with a substantial increase in the quantity of sequence data
Empirical mitochondrial DNA (mtDNA) Phylogeny Visual inspection of the cytb sequencing chromatograms of four samples revealed the presence of multiple peaks at some nucleotide positions, potentially indicating the presence of nuclear mitochondrial pseudogenes [42]
Summary
Phylogenies often contain both well-supported and poorly supported nodes. Determining how much additional data might be required to eventually recover most or all nodes with high support is an important pragmatic goal, and simulations have been used to examine this question. When difficult nodes are encountered, the logical step is to add taxa and/ or data under the reasonable assumption that additional taxa or characters might enable resolution and/or provide support for poorly supported nodes. The amount of data required for resolution of difficult phylogenetic problems associated with short internodes, especially those deep in a tree can represent a difficult challenge [5,6,7] that often requires massive amounts of sequence data to resolve. This is not always the case, and robust species trees can sometimes be recovered from moderate amounts of data. The amount of data required for a given level of resolution, and the gain in phylogenetic accuracy for an increase in data sampling, depends on the true species tree, the rate of evolution for a particular marker, and the fit of the selected model of evolution to the actual substitution pattern of the data
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have