Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains

Paula Tataru,Asger Hobolth

doi:10.1186/1471-2105-12-465

Abstract

BackgroundContinuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present.ResultsWe describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at http://www.birc.au.dk/~paula/.ConclusionsWe use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.

Highlights

Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level
The case where the CTMC is only recorded at discretely observed time points arises in molecular evolution where DNA sequence data is extracted at present day and past evolutionary events are missing
The first class of applications is concerned with rate matrix estimation. [1] describes how the expectationmaximization (EM) algorithm can be applied to estimate the rate matrix from DNA sequence data observed in the leaves of an evolutionary tree

Summary

Introduction

Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The case where the CTMC is only recorded at discretely observed time points arises in molecular evolution where DNA sequence data is extracted at present day and past evolutionary events are missing. In this situation, efficient methods for calculating these types of expectations are needed. The first class of applications is concerned with rate matrix estimation. [1] describes how the expectationmaximization (EM) algorithm can be applied to estimate the rate matrix from DNA sequence data observed in the leaves of an evolutionary tree. The EM algorithm is implemented in the software XRate [2] and has been

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2011
Citations: 40	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Summary Statistics for Endpoint-Conditioned Continuous-Time Markov Chains
Asger Hobolth ... Jens Ledet Jensen
Journal of Applied Probability | VOL. 48
Asger Hobolth, et. al.Asger Hobolth ... Jens Ledet Jensen
01 Dec 2011
Journal of Applied Probability | VOL. 48

Summary Statistics for Endpoint-Conditioned Continuous-Time Markov Chains
Asger Hobolth ... Jens Ledet Jensen
Journal of Applied Probability | VOL. 48
Asger Hobolth, et. al.Asger Hobolth ... Jens Ledet Jensen
01 Dec 2011
Journal of Applied Probability | VOL. 48

Detecting Positively Selected Sites From Amino Acid Sequences: An Implicit Codon Model
Zheng Ouyang ... Jie Liang
-
Zheng Ouyang, et. al.Zheng Ouyang ... Jie Liang
01 Aug 2007
01 Aug 2007

Computational Methods for CTMCs
Tuğrul Dayar ... William J Stewart
-
Tuğrul Dayar, et. al.Tuğrul Dayar ... William J Stewart
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics