Abstract

BackgroundAccumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources.ResultsTo improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models.ConclusionsCross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.

Highlights

  • Accumulated evidence shows that the abnormal regulation of long non-coding RNA is associated with various human diseases

  • Under the supposition that long non-coding RNA (lncRNA) with analogous function tend to be related to diseases with analogous phenotype, Sun et al proposed a lncRNA-disease association (LDA) prediction model named RWRlncD by implementing random walk with restart (RWR) on a lncRNA functional similarity (LFS) network [10]

  • Feature selection To determine how many features should be used to train random forest regression model, we studied the prediction accuracy of models on different training sample sets by 10-fold cross-validation

Read more

Summary

Introduction

Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Under the supposition that the more miRNAs two lncRNAs interacted, the more likely they are related to the analogous diseases, Zhou et al proposed a LDA prediction model by implementing random walk on a heterogeneous network which integrated the disease similarity network, the miRNA-mediated lncRNA crosstalk network and the experiment-supported LDA network [11]. Chen et al developed an improved RWR based LDA prediction model (IRWRLDA), which set the initial probability vector of RWR by combining the lncRNA expression similarity with the DSS [13] Both of the above methods can be used to new diseases that have not any experiment-supported associated lncRNAs. Yu et al implemented a bi-random walks based LDA prediction model (BRWLDA) [14]. Xie et al implemented a similarity kernel fusion based LDA prediction model (SFK-LDA) by fusing the DSS and cosine similarity, and the lncRNA expression similarity and cosine similarity [24]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call