Abstract

In discovering disease etiology and pathogenesis, the associations between MicroRNAs (miRNAs) and diseases play a critical role. Given known miRNA-disease associations (MDAs), how to uncover potential MDAs is an important problem. To solve this problem, most of the existing methods regard known MDAs as positive samples and unknown ones as negative samples, and then predict possible MDAs by iteratively revising the negative samples. However, simply viewing unknown MDAs as negative samples introduces erroneous information, which may result in poor predication performance. To avoid such defects, we present a novel method using only positive samples to predict MDAs by latent features extraction (LFEMDA). We design a new approach to construct the miRNAs similarity matrix. LFEMDA integrates the disease similarity matrix, the known MDAs and the miRNAs similarity matrix to identify potential MDAs. By introducing miRNAs and diseases knowledge as the auxiliary variables, the method can converge to give the optimal solution in each iteration. We conduct experiments on high-association diseases and new diseases datasets, in which our method shows better performance than that of other methods. We also carry out a case study on breast neoplasms to further demonstrate the capacity of our method in uncovering potential MDAs.

Highlights

  • MicroRNAs, a class of small endogenous non-coding RNAs, regulate gene expression at a post-transcriptional level through mRNA degradation or translational inhibition [1,2,3]

  • Shi et al [20] proposed a new method based on restart random walk with the restart (RWR) algorithm in 2013, which maps disease genes and miRNA target genes to protein-protein interaction (PPI) networks and sets different seeds to apply the RWR algorithm

  • This paper proposes a novel approach called miRNA-disease association prediction using latent feature extraction with positive samples (LFEMDA)

Read more

Summary

Introduction

MicroRNAs (miRNAs), a class of small endogenous non-coding RNAs, regulate gene expression at a post-transcriptional level through mRNA degradation or translational inhibition [1,2,3]. This is a reasonable extension of using network-based methods to predict protein coding genes related to diseases To improve their previous work, Jiang et al [17] believed that more data sources should be introduced to increase credibility, and they proposed a new method based on genomic data fusion. Shi et al [20] proposed a new method based on restart random walk with the restart (RWR) algorithm in 2013, which maps disease genes and miRNA target genes to protein-protein interaction (PPI) networks and sets different seeds to apply the RWR algorithm This method introduces the protein data source as the intermediary information, which improves the accuracy and credibility of the model. LFEMDA achieves great results on both the high-association diseases data and the new diseases data

Disease Semantic Similarity Network
Data Fusion
Loss Function
Optimization
Prediction
Performance Evaluation
LFEMDA with different miRNA functional similarity
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call