Abstract

In recent years, more and more studies have shown that microRNAs (miRNAs) play a key role in many important biological processes. Dysregulation of miRNAs can lead to a variety of diseases like cancers, thus predicting potential miRNA-disease associations is important for understanding drug development and disease pathogenesis, diagnosis and treatment. It is known that experimental methods to validate miRNA-disease associations typically involve miRNA knockout or knockdown, which is time and labor-intensive. As a result, computational models have been developed to predict unknown miRNA-disease associations from available information related to miRNAs, diseases, genes, and so on. However, their performances are yet to be improved. Noticing that appropriately combining multiple data-source is usually helpful for improving prediction accuracy, we have developed IMDAILM: Inferring miRNA-Disease Association by integrating lncRNA and miRNA data, a low-rank matrix completion model integrating miRNA, long noncoding RNA (lncRNA) and disease information to predict miRNA-disease associations. Specifically, the miRNA-disease association network and the lncRNA-disease association network are fused to form a new heterogeneous network consisting of 3 types of nodes representing miRNAs, lncRNAs and diseases. In addition, a negative sample inference method was proposed to infer unrelated miRNA-disease pairs. Based on both heterogeneous network and negative samples, a low-rank matrix completion model is proposed and solved. In practice, IMDAILM achieved an area under the curve (AUC) of 0.8884 for predicting miRNAs associated with diseases under the 5-fold cross-validation (CV), outperforming a few recent methods. IMDAILM also yielded an AUC of 0.8870 for predicting both lncRNAs and miRNAs associated with diseases. In addition, the 5-fold CV results indicate that IMDAILM is also superior to other methods in predicting miRNAs associated with isolated diseases. Finally, we confirmed a few novel predicted miRNAs associated with specific diseases like lung cancers by literature mining. In summary, the integration of lncRNA information into a matrix completion framework contributes to the prediction of miRNA-disease associations.

Highlights

  • According to the central law of molecular biology, genetic information is stored in protein-coding genes, and RNAs are merely the intermediary between DNAs and its coding protein [1], [2]

  • In this study, we integrated the miRNA-disease association network and the long noncoding RNA (lncRNA)-disease association network, and applied a predictive method based on low-rank matrix completion to predict the miRNA-disease association

  • The miRNA-disease association data and the lncRNA-disease association data were obtained from the known database, and they were integrated to obtain the lncRNA+miRNA-disease association network, which increased the disease-related information and contributed to improve the prediction performance

Read more

Summary

Introduction

According to the central law of molecular biology, genetic information is stored in protein-coding genes, and RNAs are merely the intermediary between DNAs and its coding protein [1], [2]. The associate editor coordinating the review of this manuscript and approving it for publication was Xiangtao Li. do not encode protein, resulting in tens of thousands noncoding RNAs (ncRNAs). NcRNAs can be further divided into microRNAs (miRNAs), long non-coding RNAs (lncRNAs) and so on. MiRNAs are ncRNAs with lengths around 22 nucleotides, while lncRNAs are usually longer than 200 nucleotides. The both types of ncRNAs are critical for posttranscriptional gene regulation by binding to complementary regions of messenger transcripts, and thereby inhibiting translation or regulating degradation [3]–[6].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call