Abstract

The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

Highlights

  • RNA-binding proteins are important functional proteins that are pivotal to a cell’s function, such as in gene expression, posttranscriptional regulation, protein synthesis, and replication and assembly of many viruses [1,2,3,4]

  • To solve the first problem, we proposed a novel feature called evolutionary information combined with physicochemical properties (EIPP)

  • When they were combined with Conjoint Triad (CT), the accuracy and Matthews correlation coefficient (MCC) increased, to 76.61% and 0.568, respectively, which are not as good as the performance obtained by the combination of the EIPP, binding propensity (BP), and nonbinding propensity (NBP)

Read more

Summary

Introduction

RNA-binding proteins are important functional proteins that are pivotal to a cell’s function, such as in gene expression, posttranscriptional regulation, protein synthesis, and replication and assembly of many viruses [1,2,3,4]. How to discriminate RNA-binding proteins from other proteins is important to understand the mechanisms of these functions. The reliable identification of RNA-binding proteins is an important research topic in the field of proteomics and will play a vital role in proteome functional annotation, in the discovery of potential therapeutics for genetic diseases and in reliable diagnostics. Several experimental techniques, such as X-ray crystallography, nuclear magnetic resonance, and filter binding assays have been used to identify RNA-binding proteins. Using experimental methods to identify RNA-binding proteins is costly and time consuming. It is desirable to develop computational methods to recognize RNA-binding proteins

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call