Abstract

Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.

Highlights

  • Protein–DNA interactions are widely distributed in all living organisms

  • The prediction methods are based on different benchmark data sets or different evaluation criteria, which complicates the comparison of disadvantages and advantages of various methods, we summarize these studies

  • We discuss future directions and some implications that are likely to result in novel prediction methods, increased performance, or practical applications in the topic of protein DNA-binding site prediction

Read more

Summary

Introduction

Protein–DNA interactions are widely distributed in all living organisms. Previous reports have estimated that 2%–3% of a prokaryotic genome and 6%–7% of a eukaryotic genome encodes. In the last three decades, many efforts have been made to develop more accurate and efficient approaches in this area These methods have focused on two aspects: determining whether a protein interacts with DNA and predicting the binding sites. These computational methods have become more accurate and are providing large amounts of data. Predictions based on sequence and structural information comprise major computational strategies commonly used to identify DNA-binding residues in a query protein. We discuss the essential biological role of protein–DNA interactions and the complete picture of DNA-binding proteins or residues using experimental strategies or computational methods. We discuss future directions and some implications that are likely to result in novel prediction methods, increased performance, or practical applications in the topic of protein DNA-binding site prediction

Benchmark Data Set
Different Residue Properties Used in Developing Predictors
Sequence-Based Features
Structural-Based Features
Physical and Chemical Features
Prediction Methods
Prediction Based on Sequences
Prediction Based on Protein Structures
Homology Modeling and Threading
Prediction Algorithms Based on Individual Descriptors
Prediction Algorithms Based on Simple Statistical Methods
Prediction Algorithms Based on Machine Learning Methods
Hybrid Learning and Meta-Prediction Methods
Performance Measures
Comparison of Different Prediction Methods
Selected Web Servers of DNA-Binding Site Predictors
Methods
Status of the Prediction of Protein-Binding Sites in DNA Sequence
Future Perspectives
Conflicts of Interest
Findings
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.