A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Haipeng Xing,Willey Liao,Yifan Mo,Michael Q Zhang

doi:10.3791/4273

Abstract

ChIPseq is a widely used technique for investigating protein-DNA interactions. Read density profiles are generated by using next-sequencing of protein-bound DNA and aligning the short reads to a reference genome. Enriched regions are revealed as peaks, which often differ dramatically in shape, depending on the target protein(1). For example, transcription factors often bind in a site- and sequence-specific manner and tend to produce punctate peaks, while histone modifications are more pervasive and are characterized by broad, diffuse islands of enrichment(2). Reliably identifying these regions was the focus of our work. Algorithms for analyzing ChIPseq data have employed various methodologies, from heuristics(3-5) to more rigorous statistical models, e.g. Hidden Markov Models (HMMs)(6-8). We sought a solution that minimized the necessity for difficult-to-define, ad hoc parameters that often compromise resolution and lessen the intuitive usability of the tool. With respect to HMM-based methods, we aimed to curtail parameter estimation procedures and simple, finite state classifications that are often utilized. Additionally, conventional ChIPseq data analysis involves categorization of the expected read density profiles as either punctate or diffuse followed by subsequent application of the appropriate tool. We further aimed to replace the need for these two distinct models with a single, more versatile model, which can capably address the entire spectrum of data types. To meet these objectives, we first constructed a statistical framework that naturally modeled ChIPseq data structures using a cutting edge advance in HMMs(9), which utilizes only explicit formulas-an innovation crucial to its performance advantages. More sophisticated then heuristic models, our HMM accommodates infinite hidden states through a Bayesian model. We applied it to identifying reasonable change points in read density, which further define segments of enrichment. Our analysis revealed how our Bayesian Change Point (BCP) algorithm had a reduced computational complexity-evidenced by an abridged run time and memory footprint. The BCP algorithm was successfully applied to both punctate peak and diffuse island identification with robust accuracy and limited user-defined parameters. This illustrated both its versatility and ease of use. Consequently, we believe it can be implemented readily across broad ranges of data types and end users in a manner that is easily compared and contrasted, making it a great tool for ChIPseq data analysis that can aid in collaboration and corroboration between research groups. Here, we demonstrate the application of BCP to existing transcription factor(10,11) and epigenetic data(12) to illustrate its usefulness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Abstract

Talk to us

Similar Papers

More From: Journal of Visualized Experiments

Lead the way for us

Journal: Journal of Visualized Experiments	Publication Date: Dec 10, 2012
Citations: 5

Similar Papers

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data
Haipeng Xing ... Will Liao
PLoS Computational Biology | VOL. 8
Haipeng Xing, et. al.Haipeng Xing ... Will Liao
26 Jul 2012
PLoS Computational Biology | VOL. 8

The Bayesian Change Point and Variable Selection Algorithm: Application to the δ18O Proxy Record of the Plio-Pleistocene
Eric Ruggieri ... Charles E Lawrence
Journal of Computational and Graphical Statistics | VOL. 23
Eric Ruggieri, et. al.Eric Ruggieri ... Charles E Lawrence
02 Jan 2014
Journal of Computational and Graphical Statistics | VOL. 23

Integrative Analysis of Histone ChIP-seq and RNA-seq Data.
Hans‐Ulrich Klein ... Martin Schäfer
Current protocols in human genetics | VOL. 90
Hans‐Ulrich Klein, et. al.Hans‐Ulrich Klein ... Martin Schäfer
01 Jul 2016
Current protocols in human genetics | VOL. 90

Integrative analysis of histone ChIP-seq and transcription data using Bayesian mixture models
Hans-Ulrich Klein ... Martin Schäfer
Bioinformatics | VOL. 30
Hans-Ulrich Klein, et. al.Hans-Ulrich Klein ... Martin Schäfer
07 Jan 2014
Bioinformatics | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Abstract

Talk to us

Similar Papers

More From: Journal of Visualized Experiments