Abstract

Non-coding RNAs (ncRNAs) play crucial roles in multiple fundamental biological processes, such as post-transcriptional gene regulation, and are implicated in many complex human diseases. Mostly ncRNAs function by interacting with corresponding RNA-binding proteins. The research on ncRNA–protein interaction is the key to understanding the function of ncRNA. However, the biological experiment techniques for identifying RNA–protein interactions (RPIs) are currently still expensive and time-consuming. Due to the complex molecular mechanism of ncRNA–protein interaction and the lack of conservation for ncRNA, especially for long ncRNA (lncRNA), the prediction of ncRNA–protein interaction is still a challenge. Deep learning-based models have become the state-of-the-art in a range of biological sequence analysis problems due to their strong power of feature learning. In this study, we proposed a hierarchical deep learning framework RPITER to predict RNA–protein interaction. For sequence coding, we improved the conjoint triad feature (CTF) coding method by complementing more primary sequence information and adding sequence structure information. For model design, RPITER employed two basic neural network architectures of convolution neural network (CNN) and stacked auto-encoder (SAE). Comprehensive experiments were performed on five benchmark datasets from PDB and NPInter databases to analyze and compare the performances of different sequence coding methods and prediction models. We found that CNN and SAE deep learning architectures have powerful fitting abilities for the k-mer features of RNA and protein sequence. The improved CTF coding method showed performance gain compared with the original CTF method. Moreover, our designed RPITER performed well in predicting RNA–protein interaction (RPI) and could outperform most of the previous methods. On five widely used RPI datasets, RPI369, RPI488, RPI1807, RPI2241 and NPInter, RPITER obtained of 0.821, 0.911, 0.990, 0.957 and 0.985, respectively. The proposed RPITER could be a complementary method for predicting RPI and constructing RPI network, which would help push forward the related biological research on ncRNAs and lncRNAs.

Highlights

  • IntroductionProtein-coding genes only account for about 2% and the vast majority are non-coding RNAs (ncRNAs), which directly function at the RNA level [1,2]

  • In human genome, protein-coding genes only account for about 2% and the vast majority are non-coding RNAs, which directly function at the RNA level [1,2]

  • The performances of three sequence coding methods in five-fold cross validation (CV) on dataset RPI2241 are shown in Table 1 and Figure 2a

Read more

Summary

Introduction

Protein-coding genes only account for about 2% and the vast majority are non-coding RNAs (ncRNAs), which directly function at the RNA level [1,2]. Bellucci et al developed catRAPID [19,20] based on the physiochemical properties of protein and RNA including secondary structure, hydrogen bonding and van der Waals propensities. Wang et al [23] proposed methods based on Naive Bayes (NB) and Extended NB (ENB) classifiers and performed similar work as Muppirala et al Lu et al [24] created a method named lncPro, which is based on Fisher linear discriminant approach and uses secondary structure, hydrogen-bond and van der Waals propensities as input features. IPMiner [26] uses the stacked auto-encoder (SAE) in deep learning to give high-level representations of the RNA and protein coding features by CTF, predicts the RNA–protein interactions by RF classifier, and further improves the prediction accuracy by logistic regression (LR)-based model ensemble method

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.