Abstract

Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.

Highlights

  • Traditional supervised Machine learning (ML) methods follow the idea that given some data, a predictive model is constructed by optimizing the difference between a given labeled output and the output predicted by the model

  • The encodings for protein and ligand (Section 2.1), the machine learning models (Section 2.2), the data sets (Section 2.3) as well as the model performances (Section 3) are reported and put in context. These studies show overall very promising results on typical benchmarks and often outperform the respective classical approach chosen for comparison, such as docking or more standard machine learning models

  • This is exemplified on the Merck Molecular Activity Kaggle competition data, where deep neural networks have shown to routinely perform better than random forest models [168]

Read more

Summary

Virtual Screening

The discovery of drugs begins with the identification of targets for a disease of interest It is followed by high-throughput screening (HTS) experiments to determine hits within the synthesized compound library, i.e., compounds showing promising bioactivity. Note that pharmacophore-based VS has incorporated machine learning, and is suitable to screen very large databases, see for example Pharmit [18] These methods are not the focus of this review and recent developments in the pharmacophore field are described by Schaller et al [19]. Unlike structure-based methods, ligand-based methods only require ligand information Note that they are not the focus of this review and the reader is kindly referred to the respective literature, e.g., [20,21]. To distinguish the described methods, which handle the two objects individually, we refer to them as pair-based methods, see Figure 1

Machine Learning and Deep Learning
Data Availability and Big Data
Deep Learning in Virtual Screening
Encodings in Virtual Screening
Ligand Encodings
Protein Encodings
Complex Encodings
Deep Learning Models in Virtual Screening
Supervised Deep Learning Models
Model Evaluation Strategies and Metrics
Data Sets and Benchmarks in Virtual Screening
Structure-Based Data Sets
Bioactivity Data Sets
Benchmarking Data Sets
Recent Developments
Complex-Based Models
Pair-Based Models
Abbreviations
Conclusions and Discussion
Precision of chemical encoding
Generalization of chemical space
Interpretability
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call