Abstract

Motivation: Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer a high false-positive prediction rate. Results: We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 31% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs. Availability and implementation: The data and code are provided at https://bitbucket.org/RSO24/ddr/. Dependencies: Python 2.7 numpy Scikitlearn Input format and files: DDR expects all network files to in the form of the adjacency list file. For relation files, DDR expects a tuple of drug and target in each line For similarity files, DDR expects a tuple of drug (target) and drug (target) and their similarity Usage: usage: DDR.py [-h] --interaction R_FILE --DSimilarity D_SIM_FILE --TSimilarity T_SIM_FILE --outfile OUT_FILE [--no_of_splits NO_OF_SPLITS] [--K K] [--K_SNF K_SNF] [--T_SNF T_SNF] [--N NO_OF_TREES] [--s SPLIT] Optional arguments: -h, --help show this help message and exit --no_of_splits NO_OF_SPLITS Number of parts to split unknown interactions. Default: 10 --K K Number of nearest neighbors for drugs and targets neighborhood. Default: 5 --K_SNF K_SNF Number of neighbors similarity fusion. Default: 3 --T_SNF T_SNF Number of iteration for similarity fusion. Default: 10 --N NO_OF_TREES Number trees for the random forest. Default: 100 --s SPLIT Split criteria for random forest trees. Default: gini Required named arguments: --interaction R_FILE Name of the file containg drug target interaction tuples --DSimilarity D_SIM_FILE Name of the file containg drug similarties file names --TSimilarity T_SIM_FILE Name of the file containg target similarties file names --outfile OUT_FILE Output file to write predictions

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call