Abstract

Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding invitro. Here, we have provided a comprehensive evaluation of invivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features (helix twist, minor groove width, propeller twist, and roll). Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides inlieu of PSSMs. Incorporating DNA shape information is most beneficial for E2F and MADS-domain TF families. Our findings indicate that incorporating DNA sequence and shape information benefits the modeling of TF binding under complex invivo conditions.

Highlights

  • One of many mechanisms that control gene expression, transcriptional regulation involves transcription factors (TFs) as key proteins (Jacob and Monod, 1961; Ptashne and Gann, 1997)

  • Large-scale data derived from HT experiments highlight higher-order positional interaction features of TF binding sites (TFBSs) that cannot be captured by classical positionspecific scoring matrix (PSSM), even though the methods based on these traditional models perform quite well (Weirauch et al, 2013)

  • E2F and MADS-Domain TF Families Benefit Most from DNA Shape Information motivated by the observation that DNA structural information improves the prediction of TFBSs for some ChIP-seq datasets more than others (Figure 2C; Data S2 and S3), we investigated whether predictions for certain TF families with similar DNA-binding domains benefit from incorporating DNA shape information

Read more

Summary

Graphical Abstract

The study confirms the importance of considering DNA shape features when modeling TF binding profiles in in vivo studies. For the available TF families, DNA shape features are most critical for the E2F and MADS-domain TF binding in a position-specific manner. 2016, Cell Systems 3, 278–286 September 28, 2016 a 2016 The Author(s).

SUMMARY
INTRODUCTION
RESULTS
DISCUSSION
METHOD DETAILS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call