Fast MCMC sampling for hidden markov models to determine copy number variations

Md Pavel Mahmud,Alexander Schliep

doi:10.1186/1471-2105-12-428

Abstract

BackgroundHidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.ResultsWe propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by kd-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling.ConclusionsWe test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches.Availability: An implementation of our method will be made available as part of the open source GHMM library from http://ghmm.org.

Highlights

Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences
Once a model is trained from the data, either using maximum likelihood (ML) or maximum a posteriori (MAP), the segmentation is given by the most likely state sequence obtained with the Viterbi algorithm [14]
ML or MAP point estimates of HMM parameters combined with the Viterbi-algorithm to compute a most likely sequence of hidden states and a segmentation of the input are most popular in practice

Summary

Introduction

Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. Once a model is trained from the data, either using maximum likelihood (ML) or maximum a posteriori (MAP), the segmentation is given by the most likely state sequence obtained with the Viterbi algorithm [14]. As analytic integration of a complex high dimensional model is impossible for most distributions, the Bayesian approach requires the use of numerical integration techniques like MCMC [15], for example by direct Gibbs sampling [16] of model parameters and state paths. Maximal compression is to be expected for small number of discrete symbols and clearly compression ratio conflicts with fidelity in the analysis

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 2, 2011
Citations: 82	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Fast MCMC sampling for hidden markov models to determine copy number variations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

An MCMC sampling approach to estimation of nonstationary hidden Markov models
P.M Djuric ... Joon-Hwa Chun
IEEE Transactions on Signal Processing | VOL. 50
P.M Djuric, et. al.P.M Djuric ... Joon-Hwa Chun
01 May 2002
IEEE Transactions on Signal Processing | VOL. 50

Variational level set segmentation for forest based on MCMC sampling
Lin Huang ... Jian Nong
-
Lin Huang, et. al.Lin Huang ... Jian Nong
08 Nov 2014
08 Nov 2014

Bayesian estimation of chirplet signals by MCMC sampling
Chung-Chieh Lin ... P.M Djuric
-
Chung-Chieh Lin, et. al. Chung-Chieh Lin ... P.M Djuric
07 May 2001
07 May 2001

Statistical maritime radar duct estimation using hybrid genetic algorithm–Markov chain Monte Carlo method
Caglar Yardim ... William S Hodgkiss
Radio Science | VOL. 42
Caglar Yardim, et. al.Caglar Yardim ... William S Hodgkiss
01 Jun 2007
Radio Science | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast MCMC sampling for hidden markov models to determine copy number variations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics