Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation

Jessica C Mar,Timothy J Harlow,Mark A Ragan

doi:10.1186/1471-2148-5-8

Abstract

BackgroundBayesian phylogenetic inference holds promise as an alternative to maximum likelihood, particularly for large molecular-sequence data sets. We have investigated the performance of Bayesian inference with empirical and simulated protein-sequence data under conditions of relative branch-length differences and model violation.ResultsWith empirical protein-sequence data, Bayesian posterior probabilities provide more-generous estimates of subtree reliability than does the nonparametric bootstrap combined with maximum likelihood inference, reaching 100% posterior probability at bootstrap proportions around 80%. With simulated 7-taxon protein-sequence datasets, Bayesian posterior probabilities are somewhat more generous than bootstrap proportions, but do not saturate. Compared with likelihood, Bayesian phylogenetic inference can be as or more robust to relative branch-length differences for datasets of this size, particularly when among-sites rate variation is modeled using a gamma distribution. When the (known) correct model was used to infer trees, Bayesian inference recovered the (known) correct tree in 100% of instances in which one or two branches were up to 20-fold longer than the others. At ratios more extreme than 20-fold, topological accuracy of reconstruction degraded only slowly when only one branch was of relatively greater length, but more rapidly when there were two such branches. Under an incorrect model of sequence change, inaccurate trees were sometimes observed at less extreme branch-length ratios, and (particularly for trees with single long branches) such trees tended to be more inaccurate. The effect of model violation on accuracy of reconstruction for trees with two long branches was more variable, but gamma-corrected Bayesian inference nonetheless yielded more-accurate trees than did either maximum likelihood or uncorrected Bayesian inference across the range of conditions we examined. Assuming an exponential Bayesian prior on branch lengths did not improve, and under certain extreme conditions significantly diminished, performance. The two topology-comparison metrics we employed, edit distance and Robinson-Foulds symmetric distance, yielded different but highly complementary measures of performance.ConclusionsOur results demonstrate that Bayesian inference can be relatively robust against biologically reasonable levels of relative branch-length differences and model violation, and thus may provide a promising alternative to maximum likelihood for inference of phylogenetic trees from protein-sequence data.

Highlights

Bayesian phylogenetic inference holds promise as an alternative to maximum likelihood, for large molecular-sequence data sets
Bayesian posterior probabilities (PPs) and nonparametric bootstrap bootstrap proportions (BPs) are not commensurate [17,48] and may be seen as "potential upper and lower bounds of node reliability" respectively. (Being more-generous than a too-conservative measure does not, imply that Bayesian PPs must be too-generous.) Our results strongly suggest that the interpretation of BPs and PPs being developed for nucleotide sequences will be applicable, as well, to protein sequences
Bayesian inference can be as robust as maximum likelihood (ML) against relative branch-length differences of 20-fold or greater in inference of correct topologies from protein-sequence data, details depend on the number of relatively long branches, the presence or absence of an effective correction for among-sites rate variation (ASRV), and other factors

Summary

Introduction

Bayesian phylogenetic inference holds promise as an alternative to maximum likelihood, for large molecular-sequence data sets. Likelihood-based approaches have proven especially powerful for inferring phylogenetic trees [1,2] but are computationally expensive owing both to the form of the likelihood function itself, and to the need to search the multidimensional space of possible outcomes (tree space) for optimal trees. This computation must be repeated, typically 100–1000 times, if the nonparametric bootstrap [3] is used to estimate the support for specific subtrees. The much faster RELL approximation [4,5] can in principle replace the bootstrap, so far it has not been extensively investigated with large datasets [6]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Evolutionary Biology	Publication Date: Jan 1, 2005
Citations: 99	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology

Lead the way for us

Similar Papers

Molecular systematics of Middle American harvest mice Reithrodontomys (Muridae), estimated from mitochondrial cytochrome b gene sequences
Elizabeth Arellano ... Duke S Rogers
Molecular Phylogenetics and Evolution | VOL. 37
Elizabeth Arellano, et. al.Elizabeth Arellano ... Duke S Rogers
26 Sep 2005
Molecular Phylogenetics and Evolution | VOL. 37

Variational Bayes method for Mixture of Principal Component Analyzers
Shigeyuki Oba ... Shin Ishii
Systems and Computers in Japan | VOL. 34
Shigeyuki Oba, et. al.Shigeyuki Oba ... Shin Ishii
18 Aug 2003
Systems and Computers in Japan | VOL. 34

Bayesian Statistical Modeling Using JAGS
Michael Schaub ... Marc Kéry
-
Michael Schaub, et. al.Michael Schaub ... Marc Kéry
01 Jan 2021
01 Jan 2021

The Devil in the Details: Interactions between the Branch-Length Prior and Likelihood Model Affect Node Support and Branch Lengths in the Phylogeny of the Psoraceae
Stefan Ekman ... Rakel Blaalid
Systematic Biology | VOL. 60
Stefan Ekman, et. al.Stefan Ekman ... Rakel Blaalid
24 Mar 2011
Systematic Biology | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology