Abstract

Sequence-specific transcription factors (TFs) recognize motifs of related nucleotide sequences at their DNA binding sites. Upon binding at these sites, TFs regulate critical molecular processes such as gene expression. It is widely assumed that a TF recognizes a single “canonical” motif, although recent studies have identified additional “non-canonical” motifs for some TFs. A comprehensive approach to identify non-canonical DNA binding motifs and the functional importance of those motifs’ matches in the human genome is necessary for fully understanding the mechanisms of TF-regulated molecular processes in human cells. To address this need, we developed a statistical pipeline for in vitro HT-SELEX data that identifies and characterizes the distributions of non-canonical TF motifs in a stringent manner. Analyzing ~170 human TFs’ HT-SELEX data, we found non-canonical motifs for 19 TFs (11%). These non-canonical motifs occur independently of the TFs’ canonical motifs. Non-canonical motif occurrences in the human genome show similar evolutionary conservation to canonical motif occurrences, explain TF binding in locations without canonical motifs, and occur within gene promoters and epigenetically marked regulatory sequences in human cell lines and tissues. Our approach and collection of non-canonical motifs expand current understanding of functionally relevant DNA binding sites for human TFs.

Highlights

  • Sequence-specific regulatory proteins, known as transcription factors (TFs), are generally assumed to recognize a single motif of related nucleotide sequences at their DNA binding sites

  • Recent discoveries in TF-DNA binding specificity have highlighted that TFs integrate several types of information to identify their specific target sites [13, 30]

  • Some discoveries have questioned the common assumption that a TF recognizes only a single sequence motif [28]

Read more

Summary

Introduction

Sequence-specific regulatory proteins, known as transcription factors (TFs), are generally assumed to recognize a single motif of related nucleotide sequences at their DNA binding sites. Recent studies [18, 28], have shown that some TFs recognize motifs that are different from their single “canonical” motifs. This phenomenon of “non-canonical” motifs was first described in PBM (protein-binding microarray) data [3, 19], but later HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment) datasets suggested that motifs found in addition to the canonical motifs are not too distinct -most often, those are due to a TF’s ability to dimerize [14] or due to minor sequence variations flanking the. By utilizing in vivo TF-DNA binding data, evolutionary conservation, and epigenetically

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call