Generating property-matched decoy molecules using deep learning.

Fergus Imrie,Anthony R Bradley,Charlotte M Deane,Alfonso Valencia

doi:10.1093/bioinformatics/btab080

Fergus Imrie, Anthony R Bradley + Show 2 more

Open Access

https://doi.org/10.1093/bioinformatics/btab080

Copy DOI

Abstract

MotivationAn essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalization and hinders virtual screening method development.ResultsWe have developed a deep learning method (DeepCoy) that generates decoys to a user’s preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules’ physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63.Availability and implementationThe code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources.Supplementary informationSupplementary data are available at Bioinformatics online.

Highlights

Virtual screening is a computational approach that is often used in early stage drug discovery to help find molecules that interact with protein targets with high affinity and specificity
When selecting decoys based on the same properties as the original datasets, our generated decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and 0.109 to 0.038 for DEKOIS 2.0
The DOE score was improved by using DeepCoy generated decoys for all 102 DUD-E targets (Fig. 2) and 80 of the 81 DEKOIS 2.0 targets (Supplementary Fig. S1)

Summary

Introduction

Virtual screening is a computational approach that is often used in early stage drug discovery to help find molecules that interact with protein targets with high affinity and specificity. There are a variety of datasets available for retrospectively benchmarking virtual screening methods. These sets consist of a collection of active and inactive molecules for a range of protein targets. Used examples for structure-based virtual screening (SBVS) are DUD (Huang et al, 2006) and DUD-E (Mysinger et al, 2012), DEKOIS (Bauer et al, 2013; Vogel et al, 2011) and MUV (Rohrer and Baumann, 2009). While experimentally verified inactives represent the gold standard for dataset construction (Lagarde et al, 2015), suitable inactive molecules are often not available. Rohrer and Baumann, 2009; Tran-Nguyen et al, 2020); these are relatively limited in size and breadth of protein targets and are not yet suitable for training general-purpose SBVS models using modern machine learning methods There are efforts to construct sets using only known inactives (e.g. Rohrer and Baumann, 2009; Tran-Nguyen et al, 2020); these are relatively limited in size and breadth of protein targets and are not yet suitable for training general-purpose SBVS models using modern machine learning methods

Methods

Results

Conclusion