Abstract

RNA-binding hot spots are a small and complementary set of interfacial residues that contribute most to the binding energy of protein-RNA interfaces. As experimental methods for identifying hot spots are time-consuming, labor-intensive and costly, there is a great interest in computational approaches that can predict hot spots on a large scale. In this paper, we introduced a sequence-based method that used ensemble classifier to predict hot spots in protein-RNA complexes. We first employed three different sequence encoding schemes based on the physicochemical properties from the AAindex database, the amino acid substitution matrix (BLOSUM62), and the predicted relative accessible surface area. Based on these sequence features, 249 individual predictors were developed to identify hot spots using the radial basis function (RBF)-based support vector machine (SVM), sigmoid-based SVM, and k-nearest neighbor algorithm (k-NN), respectively. The combinations of these individual predictors by majority voting were explored in a comprehensive way and an ensemble vote classifier composed of 43 individual predictors were selected to construct the final ensemble classifier. The ensemble classifier outperformed the state-of-the-art computational methods, yielding an F1 score of 0.843 and AUC of 0.893 on the training set as well as F1 score of 0.814 and AUC of 0.842 on the test set. The data and source code are available on the web site http://bioinfo.ahu.edu.cn:8080/SPHot.

Highlights

  • Proteins work by interacting with other molecules through their interfaces, where protein-RNA interactions play an essential role in fundamental cellular functions, such as gene expression regulation and structural recognition [1], [2]

  • DATASETS In order to produce a comparable result of our model with the previous methods in predicting hot spots over protein-RNA interfaces, we used the interface residues in 47 protein-RNA complexes as our datasets which come from Pan et al.’s work [15]

  • The interface residues with the binding free energy change G ≥ 1.0 kcal/mol are defined as hot spots and those with G < 1.0 kcal/mol are considered as non-hot spots

Read more

Summary

Introduction

Proteins work by interacting with other molecules through their interfaces, where protein-RNA interactions play an essential role in fundamental cellular functions, such as gene expression regulation and structural recognition [1], [2]. Several experiments have shown that the binding free energy of proteins is not uniformly distributed over the interaction surfaces [3], [4]. A small fraction of interface residues termed hot spots account for the majority of the binding free energy. Hot spots identification is of much concern to explore underlying biological mechanism and structural analysis [5]. Mutagenesis technologies like alanine scanning have been applied to explore the RNA-binding hot spots.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.