Abstract

BackgroundMammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described.ResultsHere, we ask whether models incorporating different degrees of spacing and orientation constraints among transcription factor binding sites are broadly consistent with several global properties of regulatory sequence. These properties include length, sequence diversity, turnover rate, and dominance of specific TFs in regulatory site identity and cell type specification. Models with and without spacing and orientation constraints are generally consistent with all observed properties of regulatory sequence, and with regulatory sequences being fundamentally small (~ 1 nucleosome). Uniqueness of regulatory regions and their rapid evolutionary turnover are expected under all models examined. An intriguing issue we identify is that the complexity of eukaryotic regulatory sites must scale with the number of active transcription factors, in order to accomplish observed specificity.ConclusionsModels of transcription factor binding with or without spacing and orientation constraints predict that regulatory sequences should be fundamentally short, unique, and turn over rapidly. We posit that the existence of master regulators may be, in part, a consequence of evolutionary pressure to limit the complexity and increase evolvability of regulatory sites.

Highlights

  • Understanding how regulatory sequence operates is central to understanding the function and evolution of genomes

  • Basset [16], in contrast, employs a convolutional neural network (CNN) in which experimentally identified regulatory sequences are “one hot encoded” as inputs, following which the first convolutional layer is formed by scanning position weight matrix (PWM) like filters that mimic transcription factors (TFs) motifs across the sequence input

  • Specific data types used as standards differed according to the models and properties being examined, as described below, overall we present DNase and ChIP data in the embryonic stem cell-H1 (ESC-H1) cell line from the Roadmap Epigenome and ENCODE projects [4, 32]

Read more

Summary

Introduction

Understanding how regulatory sequence operates is central to understanding the function and evolution of genomes. Large-scale surveys for DNase-I hypersensitivity (DHS) and histone marks have yielded similar numbers of elements (3.5 million DHS sites and 2.3 million enhancers, respectively) [3, 4]. These elements are often active only in specific tissues and cell types (typically 100,000–200,000 in any given sample), complicating functional tests. Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.