Abstract

LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/.

Highlights

  • Long noncoding RNAs are a class of transcribed RNA molecules with a length of more than 200 nucleotides that do not encode proteins [1,2]

  • We propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict Long noncoding RNAs (lncRNAs)-protein interactions

  • We propose a novel computational method “SFPEL-LPI” to predict lncRNA-protein interactions

Read more

Summary

Introduction

Long noncoding RNAs (lncRNAs) are a class of transcribed RNA molecules with a length of more than 200 nucleotides that do not encode proteins [1,2]. Many computational methods have been proposed to predict lncRNA-protein interactions, in order to screen lncRNA-protein interactions and guide wet experiments. Muppirala et al [10] adopted the k-mer composition to encode RNA sequences and protein sequences, and used SVM and random forest to build prediction models. Wang et al [11] used RNA-protein interactions as positive instances, and randomly selected twice number of protein-RNA pairs without interaction information as negative samples, and built prediction models by using naive Bayes. A random walk with restart was implemented on the heterogeneous network to infer lncRNA-protein interactions. Yang et al [16] proposed the Hetesim algorithm, which can predict lncRNA-protein relation based on the heterogeneous lncRNA-protein network. Ge et al [17] proposed a computational method “LPBNI” based on the lncRNA-protein bipartite network inference

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call