Abstract

The interactions between non-coding RNAs (ncRNAs) and proteins play an important role in many biological processes, and their biological functions are primarily achieved by binding with a variety of proteins. High-throughput biological techniques are used to identify protein molecules bound with specific ncRNA, but they are usually expensive and time consuming. Deep learning provides a powerful solution to computationally predict RNA-protein interactions. In this work, we propose the RPI-SAN model by using the deep-learning stacked auto-encoder network to mine the hidden high-level features from RNA and protein sequences and feed them into a random forest (RF) model to predict ncRNA binding proteins. Stacked assembling is further used to improve the accuracy of the proposed method. Four benchmark datasets, including RPI2241, RPI488, RPI1807, and NPInter v2.0, were employed for the unbiased evaluation of five established prediction tools: RPI-Pred, IPMiner, RPISeq-RF, lncPro, and RPI-SAN. The experimental results show that our RPI-SAN model achieves much better performance than other methods, with accuracies of 90.77%, 89.7%, 96.1%, and 99.33%, respectively. It is anticipated that RPI-SAN can be used as an effective computational tool for future biomedical researches and can accurately predict the potential ncRNA-protein interacted pairs, which provides reliable guidance for biological research.

Highlights

  • In the Human genome, 74.7% of the sequence can be transcribed into RNA, but the total exon sequence of the mRNA is only 2.94%.1–3 The remaining sequence information is output in the form of non-coding RNA, which can be divided into two types: constitutive and regulatory types.[4]

  • In this study, we propose a deep learning method named RNA-protein interactions (RPIs)-SAN, which conjoins the stacked auto-encoder network (SAN) with random forest (RF) classifiers and used position-specific scoring matrix (PSSM) with the Zernike moment and k-mers sparse matrix with singular value decomposition (SVD) to predict the interactions of ncRNA-protein

  • We use PSSM and k-mers sparse matrix to extract efficient features from proteins and RNAs, respectively. Such features will be fed into the SAN with RF predictors

Read more

Summary

Introduction

In the Human genome, 74.7% of the sequence can be transcribed into RNA, but the total exon sequence of the mRNA is only 2.94%.1–3 The remaining sequence information is output in the form of non-coding RNA (ncRNA), which can be divided into two types: constitutive and regulatory types.[4]. Compared with mRNA, lncRNA is shorter in length, less in exon and two in focus, with an average abundance of about 1/10 of mRNA and a lower sequence conservation.[5,6,7] It has been found that lncRNA can participate in all aspects of gene expression regulation by interacting with proteins such as chromatin modification complexes and transcription factors, playing a fundamental role in a variety of important biological processes such as X chromosome inactivation (Xist[8] and Tsix9), gene imprinting (H1910 and Air11), and developmental differentiation (HOTAIR12 and TINCR13). The role of ncRNA-protein interactions (ncRPIs) in the regulation of gene expression has been doubtless, only a small number of ncRNA functions and mechanisms of action have been studied. Since ncRNA functions require the coordination of protein molecules, the identification of protein molecules bound with specific ncRNA has become the main approach to revealing the function and mechanism of ncRNA

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.