Large-Scale Prediction of Drug-Target Interaction: a Data-Centric Review

Tiejun Cheng,Yanli Wang,Stephen H Bryant,Ming Hao,Takako Takeda

doi:10.1208/s12248-017-0092-6

Abstract

The prediction of drug-target interactions (DTIs) is of extraordinary significance to modern drug discovery in terms of suggesting new drug candidates and repositioning old drugs. Despite technological advances, large-scale experimental determination of DTIs is still expensive and laborious. Effective and low-cost computational alternatives remain in strong need. Meanwhile, open-access resources have been rapidly growing with massive amount of bioactivity data becoming available, creating unprecedented opportunities for the development of novel in silico models for large-scale DTI prediction. In this work, we review the state-of-the-art computational approaches for identifying DTIs from a data-centric perspective: what the underlying data are and how they are utilized in each study. We also summarize popular public data resources and online tools for DTI prediction. It is found that various types of data were employed including properties of chemical structures, drug therapeutic effects and side effects, drug-target binding, drug-drug interactions, bioactivity data of drug molecules across multiple biological targets, and drug-induced gene expressions. More often, the heterogeneous data were integrated to offer better performance. However, challenges remain such as handling data imbalance, incorporating negative samples and quantitative bioactivity data, as well as maintaining cross-links among different data sources, which are essential for large-scale and automated information integration.

Highlights

Human health nowadays has been considerably improved through medical interventions
We focus on a subset of public databases directly relevant to drug-target interactions (DTIs) prediction according to our survey (Table I)
We have reviewed public databases, online tools, and recent applications relevant to DTI prediction from the data perspective

Summary

INTRODUCTION

Human health nowadays has been considerably improved through medical interventions. many diseases remain poorly treated while new ones are emerging. Cobanoglu et al developed an active learning method with probabilistic matrix factorization (PMF), which is useful for analyzing large interaction networks [48] because it is independent of chemical, structural, or other similarity metrics and its computation time scales are linear with the number of known interactions It is probably the most intuitive approach to predict novel DTIs for a query drug from a similar drug with known targets. The availability of chemical biology data across multiple assays for a common compound library enables the generation of bioactivity profiles, which can be informative for predicting DTIs. For example, Cheng et al developed a bioactivity profile similarity search (BASS) method for associating targets to small molecules by using the known target annotations of related compounds [58]. Using an ontology-based data representation of the relationships among drugs, diseases, genes, pathways, and SNPs, Tao et al successfully identified potential targets for colorectal cancer drugs through semantic reasoning [80]

Literature and Text Mining

DISCUSSION

SUMMARY