Abstract

The development of in vitro technologies has produced new experimental information on protein binding onto DNA, which is accumulated in databases and used in studies of mechanisms regulating gene expression and in the development of computer-assisted methods of binding site recognition in pro- and eukaryotic genomes. However, it is still questionable to what extent in vitro selected sequences reflect the actual structures of the real transcription factor (TF) binding sites. The Kullback–Leibler divergence has been applied to the comparison of frequency matrices of TF binding sites constructed on sets of artificially selected sequences and real sites. The similarity of core sequences of real and artificial sites has been observed for 80% of all TFs studied. For 20% of TFs, in vitro selected binding site sequences have a broader range of permissible significant nucleotides not found in real sites. The optimal lengths of DNA sequences containing real binding sites, at which the sites are recognized most accurately, are estimated by the weight matrix method. For approximately 80% of the TFs studied, the optimal binding site length notably exceeds the lengths of the core sequences, as well as the lengths of in vitro selected sites. The detected features of in vitro selected TF binding sites impose constraints on their use in the development of computer-assisted methods of the recognition of candidate sites in genomic sequences.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.