Several methods have recently been developed that allow the reconstruction of species trees from gene trees, an important achievement in our ongoing quest to obtain reliable species phylogenies. However, considerably less attention has been given to evaluating the accuracy of species trees' estimates. Four methods for measuring branch support of species trees are tested in this study in a gene tree parsimony framework: 1) bootstrap lineages (BL) (sequences) within species, 2) bootstrap characters (BC) within genes (i.e., the standard nonparametric bootstrap), 3) bootstrap lineages and characters (BLC), and 4) posterior probability gene tree sampling (PPGTS) (where, for each resampled data set, gene trees are sampled according to their posterior probability). For each method, n species trees are reconstructed from n resampled data sets and the branch support consists in the percentage of the n species trees in which a branch is recovered. The 4 methods were tested for several species trees and for different sampling efforts (i.e., number of genes and individuals sampled) using coalescent simulations. PPGTS performed best overall with lowest Type I and II error rates, followed by BLC. The BL and BC methods had higher error rates. This suggests that in order to properly measure branch support in a species tree context, it is important to account for the uncertainty involved in reconstructing gene trees from DNA sequences as well as that involved in reconstructing the species tree from individual gene trees. With the parameters used in the simulations, sampling more individuals per species resulted in similar improvements in support values as when sampling more genes. Moreover, sampling more individuals per species appeared to be important for escaping the anomaly zone present when only 1 sequence was sampled. We also apply the 4 methods to obtain branch supports for the species phylogeny of diploid wild roses (Rosa) in North America.
Read full abstract