Abstract
Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.
Highlights
Traditional supervised Machine learning (ML) methods follow the idea that given some data, a predictive model is constructed by optimizing the difference between a given labeled output and the output predicted by the model
The encodings for protein and ligand (Section 2.1), the machine learning models (Section 2.2), the data sets (Section 2.3) as well as the model performances (Section 3) are reported and put in context. These studies show overall very promising results on typical benchmarks and often outperform the respective classical approach chosen for comparison, such as docking or more standard machine learning models
This is exemplified on the Merck Molecular Activity Kaggle competition data, where deep neural networks have shown to routinely perform better than random forest models [168]
Summary
The discovery of drugs begins with the identification of targets for a disease of interest It is followed by high-throughput screening (HTS) experiments to determine hits within the synthesized compound library, i.e., compounds showing promising bioactivity. Note that pharmacophore-based VS has incorporated machine learning, and is suitable to screen very large databases, see for example Pharmit [18] These methods are not the focus of this review and recent developments in the pharmacophore field are described by Schaller et al [19]. Unlike structure-based methods, ligand-based methods only require ligand information Note that they are not the focus of this review and the reader is kindly referred to the respective literature, e.g., [20,21]. To distinguish the described methods, which handle the two objects individually, we refer to them as pair-based methods, see Figure 1
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have