Abstract

Scanning through genomes for potential transcription factor binding sites (TFBSs) is becoming increasingly important in this post-genomic era. The position weight matrix (PWM) is the standard representation of TFBSs utilized when scanning through sequences for potential binding sites. However, many transcription factor (TF) motifs are short and highly degenerate, and methods utilizing PWMs to scan for sites are plagued by false positives. Furthermore, many important TFs do not have well-characterized PWMs, making identification of potential binding sites even more difficult. One approach to the identification of sites for these TFs has been to use the 3D structure of the TF to predict the DNA structure around the TF and then to generate a PWM from the predicted 3D complex structure. However, this approach is dependent on the similarity of the predicted structure to the native structure. We introduce here a novel approach to identify TFBSs utilizing structure information that can be applied to TFs without characterized PWMs, as long as a 3D complex structure (TF/DNA) exists. This approach utilizes an energy function that is uniquely trained on each structure. Our approach leads to increased prediction accuracy and robustness compared with those using a more general energy function. The software is freely available upon request.

Highlights

  • One of the central challenges of this post-genomic era is to decipher the complex regulatory networks that control gene expression

  • Given the promise of applying knowledge-based energy functions to the prediction of transcription factor binding sites (TFBSs) sites, we were motivated to explore other modifications that could improve the performance of vcFIRE in this task

  • Structures of 16 Saccharomyces cerevisiae transcription factor (TF) were obtained from Protein Data Bank (PDB) and experimentally verified TFBS sites for these TFs were obtained from TRANSFAC [40] and the Promoter Database of Saccharomyces cerevisiae (SCPD) [41]

Read more

Summary

Introduction

One of the central challenges of this post-genomic era is to decipher the complex regulatory networks that control gene expression. Gene expression is controlled at various stages involving many factors, including regulatory RNAs, DNA binding proteins and epigenetic modifications such as DNA methylation [2]. One major regulatory component is the binding of transcription factors (TFs) to specific DNA sequences that impart positive or negative control on the transcription of corresponding target genes. Identifying a comprehensive set of binding sites for a given TF is critical in understanding the role of that TF in gene regulatory networks. Despite this importance, the prediction of potential binding sites for many TFs remains challenging [3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call