A Bayesian hidden Markov model for detecting differentially methylated regions.

Tieming Ji

doi:10.1111/biom.13000

Abstract

Alterations in DNA methylation have been linked to the development and progression of many diseases. The bisulfite sequencing technique presents methylation profiles at base resolution. Count data on methylated and unmethylated reads provide information on the methylation level at each CpG site. As more bisulfite sequencing data become available, these data are increasingly needed to infer methylation aberrations in diseases. Automated and powerful algorithms also need to be developed to accurately identify differentially methylated regions between treatment groups. This study adopts a Bayesian approach using the hidden Markov model to account for inherent dependence in read count data. Given the expense of sequencing experiments, few replicates are available for each treatment group. A Bayesian approach that borrows information across an entire chromosome improves the reliability of statistical inferences. The proposed hidden Markov model considers location dependence among genomic loci by incorporating correlation structures as a function of genomic distance. An iterative algorithm based on expectation-maximization is designed for parameter estimation. Methylation states are inferred by identifying the optimal sequence of latent states from observations. Real datasets and simulation studies that mimic the real datasets are used to illustrate the reliability and success of the proposed method.

Full Text