Abstract

BackgroundRNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers.ResultsIn this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631.ConclusionsThe good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1110-x) contains supplementary material, which is available to authorized users.

Highlights

  • RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins

  • Datasets Two groups of datasets are used in this study: i) RBP195 was used to construct the prediction model proposed in this study; RBP68 was used for benchmark test of our prediction model with other common available models. ii) RBP138 and RBP42 were constructed for evaluating the importance of some important factors on the prediction performance such as the composition of datasets, the selection of machine-learning algorithms and the definition methods of RNA-binding sites of proteins

  • From the two distribution curves of positive samples and negative samples, the two distribution curves cross at a point whose electrostatic potential value is approximately 0.014, and when the electrostatic potential value is less than that of the cross point, the negative samples have a higher proportion than the positive samples, but the opposite occurs when the value is larger than that of the cross point

Read more

Summary

Introduction

RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Several fundamental structural and physicochemical principles underlying the mutual recognition of protein and RNA have been discovered [7,8,9,10,11,12,13] These computational predictors can be broadly divided into sequence- and structure-based predictors in terms of the key information that they use to characterize protein residues. Several other descriptors are commonly used including predicted solvent accessibility [20,21,22], predicted secondary structure [22], physicochemical property [18, 20, 21, 23, 24] Most of these sequence-based methods are developed by support vector machine (SVM), but in a few methods, some other classification algorithms are adopted, such as Naïve Bayes [25], C4.5 decision tree [18]. One recently developed structure-based method could predict both RNA- and DNA-binding residues with excellent performance [36]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call