Trends in Deep Learning for Property-driven Drug Design.

Matteo Manica,Jannis Born

doi:10.2174/0929867328666210729115728

Abstract

It is more pressing than ever to reduce the time and costs for the development of lead compounds in the pharmaceutical industry. The co-occurrence of advances in high-throughput screening and the rise of deep learning (DL) have enabled the development of large-scale multimodal predictive models for virtual drug screening. Recently, deep generative models have emerged as a powerful tool to explore the chemical space and raise hopes to expedite the drug discovery process. Following this progress in chemocentric approaches for generative chemistry, the next challenge is to build multimodal conditional generative models that leverage disparate knowledge sources when mapping biochemical properties to target structures. Here, we call the community to bridge drug discovery more closely with systems biology when designing deep generative models. Complementing the plethora of reviews on the role of DL in chemoinformatics, we specifically focus on the interface of predictive and generative modelling for drug discovery. Through a systematic publication keyword search on PubMed and a selection of preprint servers (arXiv, biorXiv, chemRxiv, and medRxiv), we quantify trends in the field and find that molecular graphs and VAEs have become the most widely adopted molecular representations and architectures in generative models, respectively. We discuss progress on DL for toxicity, drug-target affinity, and drug sensitivity prediction and specifically focus on conditional molecular generative models that encompass multimodal prediction models. Moreover, we outline future prospects in the field and identify challenges such as the integration of deep learning systems into experimental workflows in a closed-loop manner or the adoption of federated machine learning techniques to overcome data sharing barriers. Other challenges include, but are not limited to interpretability in generative models, more sophisticated metrics for the evaluation of molecular generative models, and, following up on that, community-accepted benchmarks for both multimodal drug property prediction and property-driven molecular design.

Highlights

The term Eroom’s Law was coined to describe the phenomenon that the costs for research and development of new FDA-approved drugs roughly double every nine years since the 1950s [1]
Complementing the plethora of reviews on the role of deep learning (DL) in chemoinformatics, we focus on the interface of predictive and generative modelling for drug discovery
The current state of the field has been summarized as: “the current evaluations for generative models do not reflect the complexity of real discovery problems” [247]

Summary

Introduction

The term Eroom’s Law was coined to describe the phenomenon that the costs for research and development of new FDA-approved drugs roughly double every nine years since the 1950s [1]. The sheer size of the space and the non-linearity of the desired pharmacological properties render unconditional rejection sampling approaches, the feedback for the generative model is solely based on the outcome of a virtual drug screening, largely impractical. Instead, they call into play the usage of conditional molecular generators, which receive some context about the desired properties or conditions in the form of vectorial representations that directly steer the generative process. Distributed [141] [142]

Objectives

Results

Conclusion