Abstract

Although Bayesian phylogenetic methodologies were first developed in the 1960s (Felsenstein, 1968, 2004), the approach remained relatively obscure until the initial release of the software application MrBayes (Huelsenbeck and Ronquist, 2001). Since that time, the popularity of Bayesian phylogenetics has increased tremendously, and it now must be considered a primary method of analysis on par with maximum likelihood, parsimony, and distance methods. The popularity of Bayesian analysis can be attributed to computational efficiencies that allow for explicit model-based analyses of large data sets in real time with simultaneous estimation of nodal support in the form of posterior probability values. Despite the initial enthusiasm generated by the availability of a fast likelihood-based approach, Bayesian phylogenetic analysis remains somewhat controversial. Much of the controversy is focused on two related issues: (1) the relationship between posterior probability values and nonparametric bootstrap proportions with the nagging suspicion that posterior probabilities are too liberal (e.g., Suzuki et al., 2002), and (2) the influence of prior probabilities, especially so-called flat or uninformative priors, on resulting Bayesian posteriors (Felsenstein, 2004; Zwickl and Holder, 2004; Pickett and Randle, 2005). Although there has been a spate of simulation studies published during the past 2 years, most (Alfaro et al., 2003; Cummings et al. 2003; Douady et al., 2003; Erixon et al., 2003; Huelsenbeck and Rannala, 2004; Wilcox et al., 2002) have focused on the relationship between posterior probabilities and bootstrap proportions. The relative impact of priors on posteriors has only recently received the detailed study that is required to determine if current Bayesian implementations are appropriate and, if not, how they might be corrected (e.g., Zwickl and Holder, 2004; Lewis et al., 2005). Bayesian phylogenetic analysis requires the designation of prior probabilities for each parameter in the analysis including those for alternative tree topologies, branch lengths, and the nucleotide substitution model. In each case, we usually have little a priori information that would allow us to select an appropriate informative prior distribution, thus researchers generally attempt to accommodate their ignorance by applying uninformative priors. Because the posterior probability is proportional to the product of the prior probability and the likelihood, a truly uninformative prior should allow the likelihood function to drive the outcome of the analysis (Huelsenbeck et al., 2002; Lewis, 2001a; Zwickl and Holder, 2004). Unfortunately, the designation of truly uninformative priors is notoriously difficult (see Kass and Wasserman, 1996; Zwickl and Holder, 2004), and advocates proceed with the hope that the likelihood will overwhelm inappropriately informative priors when they cannot be avoided. The viability of Bayesian phylogenetics may depend on inferences being robust to these unavoidably informative priors. In a recent article, Pickett and Randle (2005; hereafter referred to as “PR” for the sake of brevity) provide one of the first investigations of the relationship between prior and posterior probabilities for Bayesian phylogenetic analysis when applying inappropriately informative priors (see also Zwickl and Holder, 2004). They correctly recognized that the designation of uninformative priors on the tree topology does not result in uninformative clade priors (we note that the prior probability distribution of clades can be viewed either as the joint distribution over all splits, or as the marginal prior distribution for each individual split. Here we are concerned with the former interpretation). This point was clearly illustrated by PR with a simple example—if one considers a fully bifurcating five-taxon tree, there are 15 reconstructions linking each possible pair of taxa and only 9 reconstructions linking any combination of three taxa. Thus, with rooted trees, the prior probability of larger and smaller clades will be greater than those on clades of intermediate size. All else being equal, the posterior probabilities of smaller and larger clades should be inflated relative to those of clades of intermediate size. PR presented two examples of this phenomenon by analyzing both empirical DNA and contrived data sets. We first focus on the contrived data because we believe these are the only results in the PR study that clearly indicate that informative

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.