Abstract

An important question in biology is how different promoter-architectures contribute to the diversity in regulation of transcription initiation. A step forward has been the production of genome-wide maps of transcription start sites (TSSs) using high-throughput sequencing. However, the subsequent step of characterizing promoters and their functions is still largely done on the basis of previously established promoter-elements like the TATA-box in eukaryotes or the -10 box in bacteria. Unfortunately, a majority of promoters and their activities cannot be explained by these few elements. Traditional motif discovery methods that identify novel elements also fail here, because TSS neighborhoods are often highly heterogeneous containing no overrepresented motif. We present a new, organism-independent method that explicitly models this heterogeneity while unraveling different promoter-architectures. For example, in five bacteria, we detect the presence of a pyrimidine preceding the TSS under very specific circumstances. In tuberculosis, we show for the first time that the spacing between the bacterial 10-motif and TSS is utilized by the pathogen for dynamic gene-regulation. In eukaryotes, we identify several new elements that are important for development. Identified promoter-architectures show differential patterns of evolution, chromatin structure and TSS spread, suggesting distinct regulatory functions. This work highlights the importance of characterizing heterogeneity within high-throughput genomic data rather than analyzing average patterns of nucleotide composition.

Highlights

  • The last decade has seen remarkable advances in highthroughput sequencing technologies, making them both fast and cost-effective

  • We demonstrate its utility in identifying novel promoterarchitectures in three different species of bacteria: M. tuberculosis, E. coli and K. pneumoniae, as well as in two eukaryotes: fly and human

  • We define the problem of identifying hidden promoterarchitectures as one of finding an optimal partitioning of promoter sequences, where each partition is characterized by a different distribution over the alphabet {A,C,G,T}

Read more

Summary

Introduction

The last decade has seen remarkable advances in highthroughput sequencing technologies, making them both fast and cost-effective. In a given cell-type of interest, methods like cap analysis of gene expression (CAGE) [1], oligo-capping [2], cap-trapping [3] and Rapid Amplification of cDNA Ends (5 -RACE) [4] coupled with high-throughput sequencing identify transcription start sites (TSSs) associated with the transcriptome. These methods differ in the manner in which they distinguish a true site of initiation from a 5 end generated by RNA cleavage or degradation [5], but they typically produce robust genome-wide maps of TSSs [6]. Identification of such “modules” has been attempted before [12,13], but their success, again, depends on which features were considered while building modules in the first place

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.