Abstract

Many long ncRNAs (lncRNA) make their effort by interacting with the corresponding RNA-binding proteins, and identifying the interactions between lncRNAs and proteins is important to understand the functions of lncRNA. Compared with the time-consuming and laborious experimental methods, more and more computational models are proposed to predict lncRNA-protein interactions. However, few models can effectively utilize the biological network topology of lncRNA (protein) and combine its sequence structure features, and most models cannot effectively predict new proteins (lncRNA) that do not interact with any lncRNA (proteins). In this study, we proposed a projection-based neighborhood non-negative matrix decomposition model (PMKDN) to predict potential lncRNA-protein interactions by integrating multiple biological features of lncRNAs (proteins). First, according to lncRNA (protein) sequences and lncRNA expression profile data, we extracted multiple features of lncRNA (protein). Second, based on protein GO ontology annotation, lncRNA sequences, lncRNA(protein) feature information, and modified lncRNA-protein interaction network, we calculated multiple similarities of lncRNA (protein), and fused them to obtain a more accurate lncRNA(protein) similarity network. Finally, combining the similarity and various feature information of lncRNA (protein), as well as the modified interaction network, we proposed a projection-based neighborhood non-negative matrix decomposition algorithm to predict the potential lncRNA-protein interactions. On two benchmark datasets, PMKDN showed better performance than other state-of-the-art methods for the prediction of new lncRNA-protein interactions, new lncRNAs, and new proteins. Case study further indicates that PMKDN can be used as an effective tool for lncRNA-protein interaction prediction.

Highlights

  • RNA represents the direct output of genomic encoded genetic information, and a large part of the regulatory capacity of cells focuses on the synthesis, processing, transportation, modification, and translation of RNA

  • The performance of the interactive prediction method was evaluated by the 5-fold cross validation (CV), and the area under ROC curve (AUC), area under Precision-Recall curve (AUPR), and F1 value (F1) were used as evaluation indexes

  • We proposed a new long non-coding RNA (ncRNA) (lncRNA)-protein interaction prediction model, which can predict the unknown interactions between lncRNAs and proteins, and has strong prediction ability for new lncRNAs and new proteins

Read more

Summary

Introduction

RNA represents the direct output of genomic encoded genetic information, and a large part of the regulatory capacity of cells focuses on the synthesis, processing, transportation, modification, and translation of RNA. A large part of human genes plays their functions through non-coding RNA (ncRNA) (Mattick, 2005). Long non-coding RNA (lncRNA) is an important type of ncRNA, which has more than 200 nucleotide transcripts and no obvious protein coding function (Volders et al, 2013). With the development of biological information, people are becoming more and more aware of the important role of lncRNA in various biological processes; lncRNA is involved in the regulation of gene expression and function of multiple networks, affects the formation of the kernel structure domain and whole chromosome state of transcription, and participates in the interaction of two different chromosomal regions through direct mechanisms regulating the chromosome structure (Batista and Chang, 2013). It is expensive and time-consuming to detect large-scale lncRNA-protein interactions by experimental means, so a large number of computational models are proposed based on existing experimental data (Suresh et al, 2015)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call