Abstract

Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods.

Highlights

  • Protein-nucleic acid (RNA/DNA) bindings play crucial roles in most biological processes[1] and the detection of the functional sites/regions in proteins is an important step for structurally understanding the molecular mechanism of the biological processes

  • Nucleic acid binding sites in proteins are functionally important in a majority of biological processes

  • Predicting these binding sites can help the biological community in understanding the nucleic acid binding proteins in the very first step

Read more

Summary

Introduction

Protein-nucleic acid (RNA/DNA) bindings play crucial roles in most biological processes[1] and the detection of the functional sites/regions in proteins is an important step for structurally understanding the molecular mechanism of the biological processes. The definition of a nucleic acid binding residue is not standardized with definitions ranging from distance cutoffs[8,9,12,13,14,15] to the enumeration of non-covalent contacts [16,17,18,19](Supplementary Note 3 in S1 Text). This leads to ambiguities goal in the problem and variations in prediction accuracy. The distribution and ease-of-use of the programs greatly determine their help to the users in the biological community

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call