Abstract

BackgroundA key step in the regulation of gene expression is the sequence-specific binding of transcription factors (TFs) to their DNA recognition sites. However, elucidating TF binding site (TFBS) motifs in higher eukaryotes has been challenging, even when employing cross-species sequence conservation. We hypothesized that for human and mouse, many orthologous genes expressed in a similarly tissue-specific manner in both human and mouse gene expression data, are likely to be co-regulated by orthologous TFs that bind to DNA sequence motifs present within noncoding sequence conserved between these genomes.ResultsWe performed automated motif searching and merging across four different motif finding algorithms, followed by filtering of the resulting motifs for those that contain blocks of information content. Applying this motif finding strategy to conserved noncoding regions surrounding co-expressed tissue-specific human genes allowed us to discover both previously known, and many novel candidate, regulatory DNA motifs in all 18 tissue-specific expression clusters that we examined. For previously known TFBS motifs, we observed that if a TF was expressed in the specified tissue of interest, then in most cases we identified a motif that matched its TRANSFAC motif; conversely, of all those discovered motifs that matched TRANSFAC motifs, most of the corresponding TF transcripts were expressed in the tissue(s) corresponding to the expression cluster for which the motif was found.ConclusionOur results indicate that the integration of the results from multiple motif finding tools identifies and ranks highly more known and novel motifs than does the use of just one of these tools. In addition, we believe that our simultaneous enrichment strategies helped to identify likely human cis regulatory elements. A number of the discovered motifs may correspond to novel binding site motifs for as yet uncharacterized tissue-specific TFs. We expect this strategy to be useful for identifying motifs in other metazoan genomes.

Highlights

  • A key step in the regulation of gene expression is the sequence-specific binding of transcription factors (TFs) to their DNA recognition sites

  • We selected 10 data sets that span a range of enrichment scores in the ChIP-chip data [11] and that cover 5 gapped and 5 ungapped TF binding site (TFBS) motifs

  • The rankings of the word frequencies are similar among these windows [see Additional Table 1a], the relative over-representation ratios are different; this becomes more apparent when non-overlapping sequence windows are considered [see Additional Table 1b]. (We note that, interestingly, we found that the sequence windows from the first introns were enriched for GC-rich hexamers as compared to genome-wide noncoding sequence.) In order to account for the variable GC content of different region locations relative to transcription start, MultiFinder uses a background model generated from the same genomic sequence window that was used for the motif search

Read more

Summary

Introduction

A key step in the regulation of gene expression is the sequence-specific binding of transcription factors (TFs) to their DNA recognition sites. A key step in the regulation of gene expression is the sequence-specific binding of TFs to their DNA recognition sites. Since transcription factor binding sites (TFBSs) are usually short (~5–15 basepairs (bp)) and a typical sequence-specific TF binds to sites that are similar to each (page number not for citation purposes). Motif finding in metazoans has been significantly more challenging than in prokaryotes or yeast because TFBSs in metazoan genomes can be found far away from the promoter regions [1], and because the noncoding regions are typically extremely lengthy

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call