Abstract

BackgroundIdentifying the DNA binding sites for transcription factors is a key task in modeling the gene regulatory network of a cell. Predicting DNA binding sites computationally suffers from high false positives and false negatives due to various contributing factors, including the inaccurate models for transcription factor specificity. One source of inaccuracy in the specificity models is the assumption of asymmetry for symmetric models.Methodology/Principal FindingsUsing simulation studies, so that the correct binding site model is known and various parameters of the process can be systematically controlled, we test different motif finding algorithms on both symmetric and asymmetric binding site data. We show that if the true binding site is asymmetric the results are unambiguous and the asymmetric model is clearly superior to the symmetric model. But if the true binding specificity is symmetric commonly used methods can infer, incorrectly, that the motif is asymmetric. The resulting inaccurate motifs lead to lower sensitivity and specificity than would the correct, symmetric models. We also show how the correct model can be obtained by the use of appropriate measures of statistical significance.Conclusions/SignificanceThis study demonstrates that the most commonly used motif-finding approaches usually model symmetric motifs incorrectly, which leads to higher than necessary false prediction errors. It also demonstrates how alternative motif-finding methods can correct the problem, providing more accurate motif models and reducing the errors. Furthermore, it provides criteria for determining whether a symmetric or asymmetric model is the most appropriate for any experimental dataset.

Highlights

  • IntroductionThe transcription initiation reaction is facilitated by cis-regulatory regions containing DNA sequence motifs which are binding sites for general and/or specific transcription factors [1,2,3]

  • Transcription is a key step in gene expression and its regulation

  • There are many different approaches to study DNAprotein interactions and the specificity of transcription factors, both using in vivo location analysis and several different types of high-throughput in vitro binding assays [7,8,9,10,11,41]. Most of those data sources do not identify the binding sites or recognition motifs directly, but rely on some type of motif discovery program to determine the specificity of the transcription factor

Read more

Summary

Introduction

The transcription initiation reaction is facilitated by cis-regulatory regions containing DNA sequence motifs which are binding sites for general and/or specific transcription factors [1,2,3]. In order for the right gene to be expressed at the right place and time and at the right level, a high degree of specificity during protein-DNA recognition events is required to recruit the transcriptional machinery. The challenging task of identifying cis-regulatory elements often suffers from high false positive and false negative rates. One contributing factor to the error rate is inaccurate models of transcription factor specificity. Identifying the DNA binding sites for transcription factors is a key task in modeling the gene regulatory network of a cell. Predicting DNA binding sites computationally suffers from high false positives and false negatives due to various contributing factors, including the inaccurate models for transcription factor specificity. One source of inaccuracy in the specificity models is the assumption of asymmetry for symmetric models

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.