Abstract

Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.

Highlights

  • Transcription factors (TFs) and their specific binding sites act to modulate the rate of gene transcription

  • Transcription factors are critical proteins for sequencespecific control of transcriptional regulation. Finding where these proteins bind to DNA is of key importance for global efforts to decipher the complex mechanisms of gene regulation

  • We introduce a novel statistical model for the prediction of transcription factor binding sites (TFBSs) tolerant of a broader range of TFBS configurations than can be conveniently accommodated by existing methods

Read more

Summary

Introduction

Transcription factors (TFs) and their specific binding sites act to modulate the rate of gene transcription. As TFs bind to DNA in a sequence specific manner, computational methods for motif discrimination have been critically important for the prediction of transcription factor binding sites (TFBSs). TFBSs are usually short and in most cases TFs are tolerant of sequence variations at many positions of the TFBS. Approaches to address the false prediction issue have varied: phylogenetic methods to focus on sequences conserved during evolution [3], using experimentally mapped transcription start site data to focus on promoter proximal regions [4], using histone modification or DNA accessibility data to highlight likely regulatory sequences [5], or focusing on locally dense combinations of motifs [6,7] defined from TFBS enrichment analysis of co-expressed genes

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.