Abstract

An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial control.

Highlights

  • Genomic sequences facilitate both cooperative and competitive regulatory factor-factor interactions that implement cellular transcriptional regulatory logic

  • We used a human Growth Associated Binding Protein (GABP) ChIP-Seq dataset for our evaluation because GABP ChIP-Seq data were previously reported to contain homotypic events where the reads generated by multiple closely spaced binding events overlap [5]

  • We evaluated the methods using ChIP-Seq data from the insulator binding factor CTCF (CCCTC-binding factor) [16], as it binds to a stronger motif than GABP

Read more

Summary

Introduction

Genomic sequences facilitate both cooperative and competitive regulatory factor-factor interactions that implement cellular transcriptional regulatory logic. The functional syntax of DNA motifs in regulatory elements is an essential component of cellular regulatory control. Spaced motifs can facilitate cooperative homo-dimeric or hetero-dimeric factor binding, while overlapping motifs can implement competitive binding by steric hindrance. Cooperative and competitive binding are an integral part of complex cellular regulatory logic functions [1,2]. The binding of regulatory proteins to the genome cannot at present be predicted from primary DNA sequence alone as chromatin structure, co-factors, and other mechanisms make the prediction of in vivo binding from sequence empirically unreliable [3]. It is not possible to use primary DNA sequence to determine the aspects of genome syntax that are employed in vivo

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call