Abstract
BackgroundThe interactions between non-coding RNAs (ncRNA) and proteins play an essential role in many biological processes. Several high-throughput experimental methods have been applied to detect ncRNA-protein interactions. However, these methods are time-consuming and expensive. Accurate and efficient computational methods can assist and accelerate the study of ncRNA-protein interactions.ResultsIn this work, we develop a stacking ensemble computational framework, RPI-SE, for effectively predicting ncRNA-protein interactions. More specifically, to fully exploit protein and RNA sequence feature, Position Weight Matrix combined with Legendre Moments is applied to obtain protein evolutionary information. Meanwhile, k-mer sparse matrix is employed to extract efficient feature of ncRNA sequences. Finally, an ensemble learning framework integrated different types of base classifier is developed to predict ncRNA-protein interactions using these discriminative features. The accuracy and robustness of RPI-SE was evaluated on three benchmark data sets under five-fold cross-validation and compared with other state-of-the-art methods.ConclusionsThe results demonstrate that RPI-SE is competent for ncRNA-protein interactions prediction task with high accuracy and robustness. It’s anticipated that this work can provide a computational prediction tool to advance ncRNA-protein interactions related biomedical research.
Highlights
The interactions between non-coding RNAs and proteins play an essential role in many biological processes
We propose a stacking ensemble based computational model, RPI-SE, by integrating Gradient Boosting Decision Tree (GBDT, implemented by XGBoost) [27], Support Vector Machine (SVM) [28, 29] and Extremely randomized Trees [30] (ExtraTree) algorithms to predict non-coding RNA (ncRNA)-protein interactions
In this work, we proposed a stacking ensemble based computational model to predict ncRNA-protein interactions, called RPI-SE, which integrated XGBoost, SVM and ExtraTree algorithms and using high efficiency features
Summary
The interactions between non-coding RNAs (ncRNA) and proteins play an essential role in many biological processes. Several high-throughput experimental methods have been applied to detect ncRNA-protein interactions. The remaining 98% of the genes are mainly responsible for regulation, that is, they are involved in controlling when and where genes are expressed and activated [2]. This part of the huge genome produces RNA molecules that vary in size, structure, and function. They are called non-coding RNAs (ncRNA) [3]. NcRNA can be divided into several categories, which are widely present
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.