Abstract

BackgroundTranscription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS.ResultsOur statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs.ConclusionsGene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1354-5) contains supplementary material, which is available to authorized users.

Highlights

  • Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes

  • A small p-value for a cluster suggests that it contains transcription factor binding sites (TFBSs) with positional preferences relative to the transcription start site (TSS)

  • The log-odds scores themselves are the usual logarithm of a ratio, whose numerator is the product of position-specific probabilities, and whose denominator is a 3rd-order Markov background probability

Read more

Summary

Introduction

Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Molecular complexes of TFs can contain subcomplexes (subunits) that bind to regulatory modules (RMs) in DNA to perform important functions in human gene regulation [1,2,3]. Subunits coordinating TF regulation in relatively narrow sets of genes may be biologically important, but they are probably most studied in experimental systems outside humans (e.g., bacteriophages [5]). In any case, such subunits must interact with structured regulatory modules (RMs) specific to the set of genes. The TFBSs must co-localize within the RMs

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call