Link Prediction Only With Interaction Data and its Application on Drug Repositioning.

Guangsheng Wu,Zhiqun Zuo,Juan Liu

doi:10.1109/tnb.2020.2990291

Abstract

To assist drug development, many computational methods have been proposed to identify potential drug-disease treatment associations before wet experiments. Based on the assumption that similar drugs may treat similar diseases, most methods calculate the similarities of drugs and diseases by using various chemical or biological features. However, since these features may be unknown or hard to collect, such methods will not work in the face of incomplete data. Besides, due to the lack of validated negative samples in the drug-disease associations data, most methods have no choice but to simply select some unlabeled samples as negative ones, which may introduce noises and decrease the reliability of prediction. Herein, we propose a new method (TS-SVD) which only uses those known drug-protein, disease-protein and drug-disease interactions to predict the potential drug-disease associations. In a constructed drug-protein-disease heterogeneous network, assuming that drugs/diseases relating to some common proteins or diseases/drugs may be similar, we get the common neighbors count matrix of drugs/diseases, then convert it to a topological similarity matrix. After that, we get low dimensional embedding representations of drug-disease pairs by using topological features and singular value decomposition. Finally, a Random Forest classifier is trained to do the prediction. To train a more reasonable model, we select out some reliable negative samples based on the k -step neighbors relationships between drugs and diseases. Compared with some state-of-the-art methods, we use less information but achieve better or comparable performance. Meanwhile, our strategy for selecting reliable negative samples can improve the performances of these methods. Case studies have further shown the practicality of our method in discovering novel drug-disease associations.

Full Text