Abstract

Long noncoding RNAs (lncRNAs) are a class of noncoding RNA molecules longer than 200 nucleotides. Recent studies have uncovered their functional roles in diverse cellular processes and tumorigenesis. Therefore, identifying novel disease-related lncRNAs might deepen our understanding of disease etiology. However, due to the relatively small number of verified associations between lncRNAs and diseases, it remains a challenging task to reliably and effectively predict the associated lncRNAs for given diseases. In this paper, we propose a novel multiview consensus graph learning method to infer potential disease-related lncRNAs. Specifically, we first construct a set of similarity matrices for lncRNAs and diseases by taking advantage of the known associations. We then iteratively learn a consensus graph from the multiple input matrices and simultaneously optimize the predicted association probability based on a multi-label learning framework. To convey the utility of our method, three state-of-the-art methods are compared with our method on three widely used datasets. The experiment results illustrate that our method could obtain the best prediction performance under different cross validation schemes. The case study analysis implemented for uterine cervical neoplasms further confirmed the utility of our method in identifying lncRNAs as potential prognostic biomarkers in practice.

Highlights

  • With the completion of ENCODE project, researchers have found that only 2% of genes in the human genome encode proteins, while approximately 75% of the human genome is involved in the process of primary transcripts (Djebali et al, 2012; Li and Chang, 2014; Zhang et al, 2018b)

  • Leave-One-Out Cross Valuation (LOOCV) only takes one association at a time as the test sample while in five-fold Cross Validation (CV) all the known associations are randomly divided into five parts and one part was used as the test set each time

  • The Receiver Operating Characteristic (ROC) Curve was plotted in terms of the cross validation results and the Area Under the ROC Curve (AUC) was calculated to measure the prediction accuracy

Read more

Summary

Introduction

With the completion of ENCODE project, researchers have found that only 2% of genes in the human genome encode proteins, while approximately 75% of the human genome is involved in the process of primary transcripts (Djebali et al, 2012; Li and Chang, 2014; Zhang et al, 2018b). Chen et al further improved the random walk with restart framework by initializing the probability vector according to the integration of lncRNA expression similarity and disease semantic similarity (Chen et al, 2016). Yu et al applied a collaborative filtering model together with the Naive Bayesian Classifier on a constructed lncRNA-miRNAdisease tripartite network to effectively predict novel lncRNAdisease associations (Yu et al, 2019). Both Xie et al and Chen et al first fused different similarity matrices for lncRNAs and diseases based on a similarity kernel fusion model and applied different classification frameworks to predict potential associations (Chen et al, 2019; Xie et al, 2019). The newly constructed features were fed to a rotating forest to classify the lncRNA–disease associations and achieved remarkable performance

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call