Progressive multiple sequence alignment with indel evolution

Massimo Maiolo,Manuel Gil,Xiaolei Zhang,Maria Anisimova

doi:10.1186/s12859-018-2357-1

Abstract

BackgroundSequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modelled by a Markov substitution model. In contrast, the dynamics of indels are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. But the failure to model indel evolution may lead to artificially short alignments due to biased indel placement, inconsistent with phylogenetic relationship.ResultsRecently, the classical indel model TKF91 was modified to describe indel evolution on a phylogeny via a Poisson process, termed PIP. PIP allows to compute the joint marginal probability of an MSA and a tree in linear time. We present a new dynamic programming algorithm to align two MSAs –represented by the underlying homology paths– by full maximum likelihood under PIP in polynomial time, and apply it progressively along a guide tree. We have corroborated the correctness of our method by simulation, and compared it with competitive methods on an illustrative real dataset.ConclusionsOur MSA method is the first polynomial time progressive aligner with a rigorous mathematical formulation of indel evolution. The new method infers phylogenetically meaningful gap patterns alternative to the popular PRANK, while producing alignments of similar length. Moreover, the inferred gap patterns agree with what was predicted qualitatively by previous studies. The algorithm is implemented in a standalone C++ program: https://github.com/acg-team/ProPIP. Supplementary data are available at BMC Bioinformatics online.

Highlights

Sequence alignment is crucial in genomics studies
multiple sequence alignment (MSA) estimation is among the oldest bioinformatics problems, yet remains intensely studied due to its complexity (NP-hard [2,3,4])
The computation of the marginal likelihood under the classical indel models TKF91 [11] and TKF92 [12] is exponential in the number of taxa due to the absence of site independence assumption

Summary

Introduction

Sequence alignment is crucial in genomics studies. optimal multiple sequence alignment (MSA) is NP-hard. The dynamics of indels are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. Multiple sequence alignments (MSAs) are routinely required in the early stages of comparative and evolutionary genomics studies. MSA estimation is among the oldest bioinformatics problems, yet remains intensely studied due to its complexity (NP-hard [2,3,4]). All state-of-the-art MSA programs nowadays use an evolutionary model to describe changes between homologous characters, providing a more realistic description of molecular data and more accurate inferences. The computation of the marginal likelihood under the classical indel models TKF91 [11] and TKF92 [12] is exponential in the number of taxa due to the absence of site independence assumption

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 21, 2018
Citations: 21	License type: open-access

R Discovery Prime

R Discovery Prime

Progressive multiple sequence alignment with indel evolution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Evolutionary computation techniques for multiple sequence alignment
L Cai ... D Juedes
-
L Cai, et. al.L Cai ... D Juedes
16 Jul 2000
16 Jul 2000

Novel hybrid genetic algorithm for progressive multiple sequence alignment
Muhammad Ishaq Afridi
International Journal of Bioinformatics Research and Applications | VOL. 9
Muhammad Ishaq AfridiMuhammad Ishaq Afridi
01 Jan 2013
International Journal of Bioinformatics Research and Applications | VOL. 9

A method for multiple sequence alignment with gaps
S Subbiah ... S.C Harrison
Journal of Molecular Biology | VOL. 209
S Subbiah, et. al.S Subbiah ... S.C Harrison
01 Oct 1989
Journal of Molecular Biology | VOL. 209

Grammar-based distance in progressive multiple sequence alignment
David J Russell ... Hasan H Otu
BMC Bioinformatics | VOL. 9
David J Russell, et. al.David J Russell ... Hasan H Otu
10 Jul 2008
BMC Bioinformatics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Progressive multiple sequence alignment with indel evolution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics