Abstract

Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.

Highlights

  • DNA-binding proteins like transcription factors (TFs) usually function coordinatively as a module, a molecular complex that binds at cis-regulatory regions to modulate the expression of nearby genes

  • To reliably infer TF modules, here we describe ChIP-GSM, a Gibbs sampler built upon a Bayesian framework, that can further predict active regulatory elements

  • Background regions that contain amplified open chromatin regions can have high read counts in a TF ChIP-seq profile [21]. These regions will confound the correct identification of TF-bound regions, especially regions with weak binding events

Read more

Summary

Introduction

DNA-binding proteins like transcription factors (TFs) usually function coordinatively as a module, a molecular complex that binds at cis-regulatory regions to modulate the expression of nearby genes These modules consist of both master factors and mediators [1,2]. SignalSpider [6] modeled ChIP-seq read coverage using Gaussian mixture distributions and decipher the combinatorial TF binding events in a layered hierarchical probabilistic framework. In these methods, the input ChIP-seq data, which play a key role in modeling background regions and have been demonstrated important for accurate TF binding events identification [10,11,12,13], were not used, so the amplified background regions that could produce high read counts and confound the identification of binding events, especially weak ones, were not modeled. We need an approach to infer TF modules efficiently and accurately in a wide range of applications

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.