Bayesian coestimation of phylogeny and sequence alignment

Gerton Lunter,Jotun Hein,Alexei Drummond,Jens Ledet Jensen,István Miklós

doi:10.1186/1471-2105-6-83

Abstract

BackgroundTwo central problems in computational biology are the determination of the alignment and phylogeny of a set of biological sequences. The traditional approach to this problem is to first build a multiple alignment of these sequences, followed by a phylogenetic reconstruction step based on this multiple alignment. However, alignment and phylogenetic inference are fundamentally interdependent, and ignoring this fact leads to biased and overconfident estimations. Whether the main interest be in sequence alignment or phylogeny, a major goal of computational biology is the co-estimation of both.ResultsWe developed a fully Bayesian Markov chain Monte Carlo method for coestimating phylogeny and sequence alignment, under the Thorne-Kishino-Felsenstein model of substitution and single nucleotide insertion-deletion (indel) events. In our earlier work, we introduced a novel and efficient algorithm, termed the "indel peeling algorithm", which includes indels as phylogenetically informative evolutionary events, and resembles Felsenstein's peeling algorithm for substitutions on a phylogenetic tree. For a fixed alignment, our extension analytically integrates out both substitution and indel events within a proper statistical model, without the need for data augmentation at internal tree nodes, allowing for efficient sampling of tree topologies and edge lengths. To additionally sample multiple alignments, we here introduce an efficient partial Metropolized independence sampler for alignments, and combine these two algorithms into a fully Bayesian co-estimation procedure for the alignment and phylogeny problem.Our approach results in estimates for the posterior distribution of evolutionary rate parameters, for the maximum a-posteriori (MAP) phylogenetic tree, and for the posterior decoding alignment. Estimates for the evolutionary tree and multiple alignment are augmented with confidence estimates for each node height and alignment column. Our results indicate that the patterns in reliability broadly correspond to structural features of the proteins, and thus provides biologically meaningful information which is not existent in the usual point-estimate of the alignment. Our methods can handle input data of moderate size (10–20 protein sequences, each 100–200 bp), which we analyzed overnight on a standard 2 GHz personal computer.ConclusionJoint analysis of multiple sequence alignment, evolutionary trees and additional evolutionary parameters can be now done within a single coherent statistical framework.

Highlights

Joint analysis of multiple sequence alignment, evolutionary trees and additional evolutionary parameters can be done within a single coherent statistical framework
Two central problems in computational biology are the determination of the alignment and phylogeny of a set of biological sequences
In this paper we present a new cosampling procedure for phylogenetic trees and sequence alignments

Summary

Introduction

Alignment and phylogenetic inference are fundamentally interdependent, and ignoring this fact leads to biased and overconfident estimations. Two central problems in computational biology are the determination of the alignment and phylogeny of a set of biological sequences. ClustalW [1] and T-Coffee [2] are popular sequence alignment packages, while MrBayes [3], PAUP* [4] and Phylip [5] all provide phylogenetic reconstruction and inference. ClustalW and T-Coffee compute their alignments based on a neighbour-joining guide tree, biasing subsequent phylogenetic estimates based on the resulting alignment. Fixing the alignment after the first step ignores the residual uncertainty in the alignment, resulting in an overconfident phylogenetic estimate

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 1, 2005
Citations: 235	License type: cc-by

R Discovery Prime

R Discovery Prime

Bayesian coestimation of phylogeny and sequence alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Visual analysis and comparison of multiple sequence alignments
...
-
, et. al. ...
07 Sep 2016
07 Sep 2016

Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci.
Michael A Chapman ... Anthony R Green
Genome Research | VOL. 14
Michael A Chapman, et. al.Michael A Chapman ... Anthony R Green
12 Jan 2004
Genome Research | VOL. 14

EShadow: a tool for comparing closely related sequences.
Ivan Ovcharenko ... Gabriela G Loots
Genome Research | VOL. 14
Ivan Ovcharenko, et. al.Ivan Ovcharenko ... Gabriela G Loots
01 Jun 2004
Genome Research | VOL. 14

Integration of Alignment and Phylogeny in the Whole-Genome Era

-

18 Jun 2015
18 Jun 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bayesian coestimation of phylogeny and sequence alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics