Abstract

We study the limits imposed by transcription factor specificity on the maximum number of binding motifs that can coexist in a gene regulatory network, using the SwissRegulon Fantom5 collection of 684 human transcription factor binding sites as a model. We describe transcription factor specificity using regular expressions and find that most human transcription factor binding site motifs are separated in sequence space by one to three motif-discriminating positions. We apply theorems based on the pigeonhole principle to calculate the maximum number of transcription factors that can coexist given this degree of specificity, which is in the order of ten thousand and would fully utilize the space of DNA subsequences. Taking into account an expanded DNA alphabet with modified bases can further raise this limit by several orders of magnitude, at a lower level of sequence space usage. Our results may guide the design of transcription factors at both the molecular and system scale.

Highlights

  • In order to understand and preserve molecular biodiversity, it is valuable to investigate if evolution has explored all the options that are possible in theory

  • We propose that the effective alphabet size of DNA may be over ten letters, which would significantly increase all theoretical estimates for the maximal number of coexisting transcription factor binding sites (TFBS) motifs

  • We generate a regular expression from each matrix, using information theory to minimize the loss of information

Read more

Summary

Introduction

In order to understand and preserve molecular biodiversity, it is valuable to investigate if evolution has explored all the options that are possible in theory. Gene networks regulate the expression of up to thousands of genes via interactions between genomic DNA and proteins such as transcription factors [6, 7]. The components of gene regulatory networks interact in a specific manner: each transcription factor usually recognizes a subset of all possible genomic DNA subsequences and different transcription factors usually recognize non-overlapping sets of DNA subsequences. Some natural transcription factors show similar binding specificities [8].

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call