General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

Kiyoshi Ezawa

doi:10.1186/s12859-016-1105-7

Kiyoshi Ezawa

Open Access

https://doi.org/10.1186/s12859-016-1105-7

Copy DOI

Abstract

BackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Recently, indel probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, currently none of these models can fully accommodate biologically realistic features, such as overlapping indels, power-law indel-length distributions, and indel rate variation across regions.ResultsHere, we theoretically dissect the ab initio calculation of the probability of a given sequence alignment under a genuine stochastic evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model is a simple extension of the general “substitution/insertion/deletion (SID) model”. Using the operator representation of indels and the technique of time-dependent perturbation theory, we express the ab initio probability as a summation over all alignment-consistent indel histories. Exploiting the equivalence relations between different indel histories, we find a “sufficient and nearly necessary” set of conditions under which the probability can be factorized into the product of an overall factor and the contributions from regions separated by gapless columns of the alignment, thus providing a sort of generalized HMM. The conditions distinguish evolutionary models with factorable alignment probabilities from those without ones. The former category includes the “long indel” model (a space-homogeneous SID model) and the model used by Dawg, a genuine sequence evolution simulator.ConclusionsWith intuitive clarity and mathematical preciseness, our theoretical formulation will help further advance the ab initio calculation of alignment probabilities under biologically realistic models of sequence evolution via indels.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1105-7) contains supplementary material, which is available to authorized users.

Highlights

Insertions and deletions account for more nucleotide differences between two related DNA sequences than substitutions do, and it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes
Since the groundbreaking works by Bishop and Thompson [8] and by Thorne, Kishino and Felsenstein [9], many studies have been made to calculate the probabilities of pairwise alignments (PWAs) and multiple sequence alignments (MSAs) under probabilistic models aiming to incorporate the effects of indels
To the best of our knowledge, this is the first study to theoretically dissect the ab initio calculation of alignment probabilities under a genuine stochastic evolutionary model, which describes the evolution of an entire sequence via insertions and deletions along the time axis

Summary

Introduction

Insertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Indel probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment It is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. The methods have greatly improved in terms of the computational efficiency and the scope of application (reviewed, e.g., in [10,11,12]) Most of these studies are based on hidden Markov models (HMMs) (e.g., [13]) or transducer theories (e.g., [14]). The studies on these methods are steadily advancing (e.g., [15, 16]), and it seems that their mathematical and algorithmic bases are about to be established

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 17, 2016
Citations: 45	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

General continuous-time Markov model of sequence evolution via insertions/deletions: local alignment probability computation.
Kiyoshi Ezawa
BMC bioinformatics | VOL. 17
Kiyoshi EzawaKiyoshi Ezawa
27 Sep 2016
BMC bioinformatics | VOL. 17

PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy
Jeremy M Brown ... Robert Eldabaje
Bioinformatics | VOL. 25
Jeremy M Brown, et. al.Jeremy M Brown ... Robert Eldabaje
19 Dec 2008
Bioinformatics | VOL. 25

Efficient algorithms for inverting evolution
Martin Farach ... Sampath Kannan
Journal of the ACM | VOL. 46
Martin Farach, et. al.Martin Farach ... Sampath Kannan
01 Jul 1999
Journal of the ACM | VOL. 46

Ribosomal RNA Phylogeny Derived from a Correlation Model of Sequence Evolution
A Von Haeseler ... M Schöniger
-
A Von Haeseler, et. al.A Von Haeseler ... M Schöniger
01 Jan 1996
01 Jan 1996

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics