Abstract
Recognition of genomic binding sites by transcription factors can occur through base-specific recognition, or by recognition of variations within the structure of the DNA macromolecule. In this article, we investigate what information can be retrieved from local DNA structural properties that is relevant to transcription factor binding and that cannot be captured by the nucleotide sequence alone. More specifically, we explore the benefit of employing the structural characteristics of DNA to create binding-site models that encompass indirect recognition for the Escherichia coli model organism. We developed a novel methodology [Conditional Random fields of Smoothed Structural Data (CRoSSeD)], based on structural scales and conditional random fields to model and predict regulator binding sites. The value of relying on local structural-DNA properties is demonstrated by improved classifier performance on a large number of biological datasets, and by the detection of novel binding sites which could be validated by independent data sources, and which could not be identified using sequence data alone. We further show that the CRoSSeD-binding-site models can be related to the actual molecular mechanisms of the transcription factor DNA binding, and thus cannot only be used for prediction of novel sites, but might also give valuable insights into unknown binding mechanisms of transcription factors.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.