DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations.

Ahmet Sureyya Rifaioglu,Esra Nalbat,Maria Jesus Martin,Rengul Cetin-Atalay,Tunca Doğan,Volkan Atalay

doi:10.1039/c9sc03414e

Abstract

The identification of physical interactions between drug candidate compounds and target biomolecules is an important process in drug discovery. Since conventional screening procedures are expensive and time consuming, computational approaches are employed to provide aid by automatically predicting novel drug-target interactions (DTIs). In this study, we propose a large-scale DTI prediction system, DEEPScreen, for early stage drug discovery, using deep convolutional neural networks. One of the main advantages of DEEPScreen is employing readily available 2-D structural representations of compounds at the input level instead of conventional descriptors that display limited performance. DEEPScreen learns complex features inherently from the 2-D representations, thus producing highly accurate predictions. The DEEPScreen system was trained for 704 target proteins (using curated bioactivity data) and finalized with rigorous hyper-parameter optimization tests. We compared the performance of DEEPScreen against the state-of-the-art on multiple benchmark datasets to indicate the effectiveness of the proposed approach and verified selected novel predictions through molecular docking analysis and literature-based validation. Finally, JAK proteins that were predicted by DEEPScreen as new targets of a well-known drug cladribine were experimentally demonstrated in vitro on cancer cells through STAT3 phosphorylation, which is the downstream effector protein. The DEEPScreen system can be exploited in the fields of drug discovery and repurposing for in silico screening of the chemogenomic space, to provide novel DTIs which can be experimentally pursued. The source code, trained "ready-to-use" prediction models, all datasets and the results of this study are available at ; https://github.com/cansyl/DEEPscreen.

Highlights

One of the initial steps of drug discovery is the identi cation of novel drug-like compounds that interact with the prede ned target proteins
DEEPScreen is a collection of deep convolutional neural network (DCNN), each of which is an individual predictor for a target protein
Following the preparation of datasets, we extracted target protein based statistics, in terms of amino acid sequences,[7] domains,[39,40] functions, interacting compounds and disease indications.[41,42]. The results of this analysis can be found in Electronic supplementary information (ESI) document section 2.1 and Fig. S1.† We carried out several tests to examine the robustness of the DEEPScreen system against input image transformations, since this is a critical topic for CNN architectures that process 2D images

Summary

Introduction

One of the initial steps of drug discovery is the identi cation of novel drug-like compounds that interact with the prede ned target proteins. The studies published so far have indicated that DTI prediction is an open problem, where novel ML algorithms and new data representation approaches are required to shed light on the un-charted parts of the DTI space[9,10,11,12,13,14,15,16,17,18,19,20,21] and for other related tasks such as reaction[22] and reactivity predictions[23] and de novo molecular design.[24,25] This effort comprises the identi cation of novel drug candidate compounds, as well as the repurposing of the existing drugs on the market.[26] in order for the DTI prediction methods to be useful in real-world drug discovery and development research, they should be made available to the research community as tools and/or services via open access repositories. Some examples to the available deep learning based frameworks and tools in the literature for various purposes in computational chemistry based drug discovery are given as follows: gnina, a DL framework for molecular docking (repository: https://github.com/gnina/gnina);[27,28,29,30] Chainer Chemistry, a DL framework for chemical property prediction, based on Chainer (repository: https://github.com/chainer/chainerchemistry);[31] DeepChem, a comprehensive open-source toolchain for DL in drug discovery (repository: https://github.com/ deepchem/deepchem);[32] MoleculeNet, a benchmarking system for molecular machine learning, which builds on DeepChem (repository: http://moleculenet.ai/);[13] and SELFIES, a sequencebased representation of semantically constrained graphs, which is applicable to represent chemical compound structures as graphs (repository: https://github.com/aspuru-guzik-group/ sel es).[33]

Objectives

Methods

Results

Conclusion