Abstract

Machine learning has shown enormous potential for computer-aided drug discovery. Here we show how modern convolutional neural networks (CNNs) can be applied to structure-based virtual screening. We have coupled our densely connected CNN (DenseNet) with a transfer learning approach which we use to produce an ensemble of protein family-specific models. We conduct an in-depth empirical study and provide the first guidelines on the minimum requirements for adopting a protein family-specific model. Our method also highlights the need for additional data, even in data-rich protein families. Our approach outperforms recent benchmarks on the DUD-E data set and an independent test set constructed from the ChEMBL database. Using a clustered cross-validation on DUD-E, we achieve an average AUC ROC of 0.92 and a 0.5% ROC enrichment factor of 79. This represents an improvement in early enrichment of over 75% compared to a recent machine learning benchmark. Our results demonstrate that the continued improvements in machine learning architecture for computer vision apply to structure-based virtual screening.

Highlights

  • Drug discovery requires finding molecules that interact with targets with high affinity and specificity

  • We examine the applicability of modern convolutional neural networks (CNNs) to structure-based virtual screening by utilizing a densely connected convolutional neural network architecture (DenseNet).[29]

  • Our approach achieved state-of-the-art performance on the Directory of Useful Decoys: Enhanced (DUD-E) benchmark, recording average per-target area under curve (AUC) receiver operating characteristic (ROC) of 0.917, AUC precision recall curve (PRC) of 0.443, and 0.5% ROC enrichment factor of 79.3 (Table 2)

Read more

Summary

Introduction

Drug discovery requires finding molecules that interact with targets with high affinity and specificity. The use of specific features, such as descriptors[17,19] or fingerprints,[20] both biases the model to the choice of features and leads to an unnecessary loss of information through the elimination or approximation of the raw structural data. For these reasons, following the work of Ragoza et al.,[21] we have adopted an approach that minimizes initial featurization of input data

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call