Abstract

The debate about whether phylogenetic accuracy is most efficiently increased by sampling more charac ters or more taxa is certainly not new (e.g., Kim, 1996; Graybeal, 1998; Poe, 1998a,b; Rannala et al., 1998; Poe and Swofford, 1999; Pollock and Bruno, 2000; Rosenburg and Kumar, 2001; Pollock et al., 2002; Zwickl and Hillis, 2002; Rosenberg and Kumar, 2003; Hillis et al., 2003). However, the recent increase of whole genomic sequences available from an assortment of distantly related taxa makes this debate highly relevant to researchers across fields of bi ology. Recently, Rokas et al. (2003) argued that the true species tree can be recovered despite conflicting phylo genetic signal between genes if enough genes are used in the analysis. Using the bootstrap proportion (BP) as a measure of phylogenetic accuracy, they concluded that approximately 20 genes are needed to ensure a robustly supported tree (>95% BP) for their study group of eight yeast taxa. From these empirical results, they generalized that most molecular phylogenetic studies have probably included insufficient numbers of genes to confidently re solve relationships within their respective focal groups. This approach to measuring accuracy can be sensitive to method inconsistency, or the failure to converge on the correct tree as the data set becomes infinitely large. When a method is inconsistent, measures of support such as nonparametric bootstrapping can increase as more se quence data are added?but in support of the wrong phy logeny (Phillips et al., 2004; Collins et al., 2005; Delsuc et al., 2005). Although most methods perform well over most of tree space (Huelsenbeck, 1995; Poe, 2003), regions of inconsistency have been identified in the literature for all of the most commonly used phylogenetic meth ods. For example, compositional bias can affect the accu racy of minimum evolution (Phillips et al., 2004), model misspecification may affect parametric methods such as maximum likelihood (ML) (Poe, 2003; Philippe et al, 2005; Collins et al., 2005), and branch-length asymme try can lead to inconsistency in maximum parsimony (Felsenstein, 1978; Hendy and Penny, 1989). Parsimony is particularly prone to long-branch attraction (LBA), an analytical artifact in which two taxa on long branches are incorrectly placed as sister taxa (Felsenstein, 1978; Hendy and Penny, 1989; Huelsenbeck and Hillis, 1993). Although there are many reasons for conflicting phylo genetic signal between genes, one relevant reason could be related to method inconsistency: differing rates of evo lution between genes could cause a particular method to be inconsistent for some genes and not for others. We argue that by addressing this source of conflict between genes, fewer genes may be needed to return an accu rate phylogeny. One source of conflict in the Rokas et al. (2003) data set may be nonstationarity: taxa that differ from the others in their base compositional bias may be erroneously drawn together as sister taxa (Collins et al., 2005). Here, we show that an additional source of conflict between the 106 genes in the Rokas et al. data set may be branch-length asymmetry. Using simulations of 106 genes from the Rokas et al. data set on a 79-taxon yeast phylogeny, we additionally show that when genes are added to a data set, support for the wrong reconstruc tion can increase when there is LBA. However, when taxa are added to the analysis, support for the correct reconstruction increases, and fewer genes are needed to achieve accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call