Abstract
RNA-binding proteins (RBPs) are involved in a number of biological processes such as RNA synthesis, protein folding, alternative splicing, etc. Predicting RBPs can facilitate the discovery and treatment of human diseases, such as muscle atrophy, nervous system diseases, and cancer. However, there are still various challenges in identifying RBPs using experimental methods. Computational methods, and in particular Deep Learning, are being deployed to alleviate some of these challenges and provide new avenues of investigation in the field of RBPs prediction. Here, we propose DEEPStack-RBP, a novel RBPs prediction tool based on deep learning and ensemble learning. First, conjoint triad (CT), local descriptors (LD), pseudo amino acid composition (PseAAC), multivariate mutual information (MMI) and position specific scoring matrix-transition probability composition (PSSM-TPC) are applied to extract multiple features from the proteins. Subsequently, autoencoder (AE) is used to eliminate redundancy in features, and SMOTE-ENN is employed to balance the samples by minimizing the number difference between positive and negative cases. Finally, the stacked ensemble classifier composed of bidirectional long short-term memory (BiLSTM), gated recurrent unit (GRU), and support vector machine (SVM) is used for prediction. On the training dataset RBP9873, the ACC value of DEEPStack-RBP reaches 98.76% with a MCC value of 0.9508. For the three independent test datasets of Human, S. cerevisiae and A. thaliana, the accuracy of the model is 97.16%, 97.67% and 99.57% respectively, and the MCC is 0.9405, 0.9499 and 0.9906 respectively. These results show that DEEPStack-RBP can be used as a powerful tool for RBPs prediction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.