Abstract

Empirically, it has been observed in several cases that the information content of transcription factor binding site sequences ( R sequence) approximately equals the information content of binding site positions ( R frequency). A general framework for formal models of transcription factors and binding sites is developed to address this issue. Measures for information content in transcription factor binding sites are revisited and theoretic analyses are compared on this basis. These analyses do not lead to consistent results. A comparative review reveals that these inconsistent approaches do not include a transcription factor state space. Therefore, a state space for mathematically representing transcription factors with respect to their binding site recognition properties is introduced into the modelling framework. Analysis of the resulting comprehensive model shows that the structure of genome state space favours equality of R sequence and R frequency indeed, but the relation between the two information quantities also depends on the structure of the transcription factor state space. This might lead to significant deviations between R sequence and R frequency. However, further investigation and biological arguments show that the effects of the structure of the transcription factor state space on the relation of R sequence and R frequency are strongly limited for systems which are autonomous in the sense that all DNA-binding proteins operating on the genome are encoded in the genome itself. This provides a theoretical explanation for the empirically observed equality.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call