Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

Sunil Kumar,Philipp Bucher

doi:10.1186/s12859-015-0846-z

Abstract

BackgroundUnderstanding the mechanisms by which transcription factors (TF) are recruited to their physiological target sites is crucial for understanding gene regulation. DNA sequence intrinsic features such as predicted binding affinity are often not very effective in predicting in vivo site occupancy and in any case could not explain cell-type specific binding events. Recent reports show that chromatin accessibility, nucleosome occupancy and specific histone post-translational modifications greatly influence TF site occupancy in vivo. In this work, we use machine-learning methods to build predictive models and assess the relative importance of different sequence-intrinsic and chromatin features in the TF-to-target-site recruitment process.MethodsOur study primarily relies on recent data published by the ENCODE consortium. Five dissimilar TFs assayed in multiple cell-types were selected as examples: CTCF, JunD, REST, GABP and USF2. We used two types of candidate target sites: (a) predicted sites obtained by scanning the whole genome with a position weight matrix, and (b) cell-type specific peak lists provided by ENCODE. Quantitative in vivo occupancy levels in different cell-types were based on ChIP-seq data for the corresponding TFs. In parallel, we computed a number of associated sequence-intrinsic and experimental features (histone modification, DNase I hypersensitivity, etc.) for each site. Machine learning algorithms were then used in a binary classification and regression framework to predict site occupancy and binding strength, for the purpose of assessing the relative importance of different contextual features.ResultsWe observed striking differences in the feature importance rankings between the five factors tested. PWM-scores were amongst the most important features only for CTCF and REST but of little value for JunD and USF2. Chromatin accessibility and active histone marks are potent predictors for all factors except REST. Structural DNA parameters, repressive and gene body associated histone marks are generally of little or no predictive value.ConclusionsWe define a general and extensible computational framework for analyzing the importance of various DNA-intrinsic and chromatin-associated features in determining cell-type specific TF binding to target sites. The application of our methodology to ENCODE data has led to new insights on transcription regulatory processes and may serve as example for future studies encompassing even larger datasets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0846-z) contains supplementary material, which is available to authorized users.

Highlights

Understanding the mechanisms by which transcription factors (TF) are recruited to their physiological target sites is crucial for understanding gene regulation
As we found again that Support Vector Machines (SVM) with radial kernel performed better than the other two methods tested, we used this method here and in all subsequent analyses described in this paper
There are 30,073 common CTCF sites (18.6 %) out of 161,438 sites in total. We extended this type of regression-based machine modeling framework for predicting tag counts to three additional factors (REST, GABP and USF2) and three additional cell types (GM12878, HeLa and HepG2)

Summary

Introduction

Understanding the mechanisms by which transcription factors (TF) are recruited to their physiological target sites is crucial for understanding gene regulation. DNA sequence intrinsic features such as predicted binding affinity are often not very effective in predicting in vivo site occupancy and in any case could not explain cell-type specific binding events. Genes are regulated by transcription factors (TF) binding to physiological target sites in the genome. Research in this area has been hampered by the lack of powerful assays to study TF binding events in vivo This has drastically changed with the advent of the ChIP-seq technology which allows for comprehensive, genome-wide mapping of all in vivo bound sites of a given TF in a particular cell type at near base-pair resolution [2]. The recruitment of TFs to target sites depends on both DNA-intrinsic properties and cell type specific covariates

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 11, 2016
Citations: 30	License type: cc-by

R Discovery Prime

R Discovery Prime

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Author response: A genome-wide view of the de-differentiation of central nervous system endothelial cells in culture
Mark F Sabbagh ... Jeremy Nathans
-
Mark F Sabbagh, et. al.Mark F Sabbagh ... Jeremy Nathans
20 Nov 2019
20 Nov 2019

Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns
Divyanshi Srivastava ... Shaun Mahony
Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms | VOL. 1863
Divyanshi Srivastava, et. al.Divyanshi Srivastava ... Shaun Mahony
19 Oct 2019
Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms | VOL. 1863

Integrative model of genomic factors for determining binding site selection by estrogen receptor‐α
Roy Joseph ... Leena Ukil
Molecular Systems Biology | VOL. 6
Roy Joseph, et. al.Roy Joseph ... Leena Ukil
01 Jan 2009
Molecular Systems Biology | VOL. 6

Transcription Factors and DNA Play Hide and Seek.
David M Suter
Trends in Cell Biology | VOL. 30
David M SuterDavid M Suter
07 Apr 2020
Trends in Cell Biology | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics