PIWI-interacting RNAs (piRNAs) are a typical class of small non-coding RNAs, which are essential for gene regulation, genome stability and so on. Accumulating studies have revealed that piRNAs have significant potential as biomarkers and therapeutic targets for a variety of diseases. However current computational methods face the challenge in effectively capturing piRNA-disease associations (PDAs) from limited data. In this study, we propose a novel method, MRDPDA, for predicting PDAs based on limited data from multiple sources. Specifically, MRDPDA integrates a deep factorization machine (deepFM) model with regularizations derived from multiple yet limited datasets, utilizing separate Laplacians instead of a simple average similarity network. Moreover, a unified objective function to combine embedding loss about similarities is proposed to ensure that the embedding is suitable for the prediction task. In addition, a balanced benchmark dataset based on piRPheno is constructed and a deep autoencoder is applied for creating reliable negative set from the unlabeled dataset. Compared with three latest methods, MRDPDA achieves the best performance on the pirpheno dataset in terms of the five-fold cross validation test and independent test set, and case studies further demonstrate the effectiveness of MRDPDA.
Read full abstract