Abstract
Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.
Highlights
DNA-binding proteins like transcription factors (TFs) usually function coordinatively as a module, a molecular complex that binds at cis-regulatory regions to modulate the expression of nearby genes
To reliably infer TF modules, here we describe ChIP-GSM, a Gibbs sampler built upon a Bayesian framework, that can further predict active regulatory elements
Background regions that contain amplified open chromatin regions can have high read counts in a TF ChIP-seq profile [21]. These regions will confound the correct identification of TF-bound regions, especially regions with weak binding events
Summary
DNA-binding proteins like transcription factors (TFs) usually function coordinatively as a module, a molecular complex that binds at cis-regulatory regions to modulate the expression of nearby genes These modules consist of both master factors and mediators [1,2]. SignalSpider [6] modeled ChIP-seq read coverage using Gaussian mixture distributions and decipher the combinatorial TF binding events in a layered hierarchical probabilistic framework. In these methods, the input ChIP-seq data, which play a key role in modeling background regions and have been demonstrated important for accurate TF binding events identification [10,11,12,13], were not used, so the amplified background regions that could produce high read counts and confound the identification of binding events, especially weak ones, were not modeled. We need an approach to infer TF modules efficiently and accurately in a wide range of applications
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.