Abstract

Protein-RNA complexes play key roles in several cellular processes by the interactions of amino acids with RNA. To understand the recognition mechanism, it is important to identify the specific amino acids involved in RNA binding. Various computational methods have been developed for predicting RNA binding residues from protein sequence. However, their performances mainly depend on the training dataset, feature selection for developing a model and learning capacity of the model. Hence, it is important to reveal the correspondence between the performance of methods and properties of RNA-binding proteins (RBPs). In this work, we have collected all available RNA binding residues prediction methods and revealed their performances on unbiased, stringent and diverse datasets for RBPs with less than 25% sequence identity based on structural class, fold, superfamily, family, protein function, RNA type, RNA strand and RNA conformation. The best methods for each type of RBPs and the type of RBPs, which require further refinement in prediction, have been brought out. We also analyzed the performance of these methods for the disordered regions, structures which are not included in the training dataset and recently solved structures. The reliability of prediction is better than randomly choosing any method or combination of methods. This approach would be a valuable resource for biologists to choose the best method based on the type of RBPs for designing their experiments and the tool is freely accessible online at www.iitm.ac.in/bioinfo/RNA-protein/.

Highlights

  • Protein-RNA interactions play significant roles in many biological processes such as mRNA stabilization and processing [1], protein synthesis [2], post translational modification [3], [4], assembly and function of ribosomes [5], eukaryotic spliceosomes assembly [6] and replication of virus [7], [8]

  • The families SM motif of SNRNP and L23p are predicted with the highest accuracy of 89% and 92% whereas RNB domain-like and Comoviridae-like VP are poorly predicted with accuracy of 66% and 59%, respectively. These results showed that the prediction methods are complementing each other in different types of RNAbinding proteins (RBPs)

  • Performance of prediction methods in different datasets We have evaluated the performance of methods using two different and independent datasets: i) dataset of structures, which are not included in the training dataset for developing individual prediction methods and ii) dataset of recently solved protein-RNA complex structures

Read more

Summary

Introduction

Protein-RNA interactions play significant roles in many biological processes such as mRNA stabilization and processing [1], protein synthesis [2], post translational modification [3], [4], assembly and function of ribosomes [5], eukaryotic spliceosomes assembly [6] and replication of virus [7], [8]. Due to the experimental constraints in solving protein-RNA complex structures and the availability of large number of sequences [12], several methods have been proposed to identify the RNA binding sites from amino acid sequence using computational algorithms [13]–[24]. Wang and Brown (2006) proposed a Support vector machine (SVM) model trained with biochemical features of protein sequence and structure such as molecular mass, hydrophobicity, side chain pKa values, etc., for predicting the binding sites [15]. They improved the prediction accuracy using evolutionary information in the form of position specific scoring matrices [22]. It is important to reveal the correspondence between the type of a protein and performance of prediction methods

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call