Abstract

The number of ‘small’ molecules that may be of interest to chemical biologists — chemical space — is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved ‘forward’ problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). ‘Deep’ (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.

Highlights

  • Much of chemical biology is involved with the study of the interactions between small molecules and biomacromolecules, along with any physiological consequences, usually with the aim of finding molecules that are in some senses ‘better’

  • The classical version of chemical genomics was data-driven or ‘function first’; a small molecule was applied to the system of interest and it either worked or it did not

  • ‘explainable AI’ will continue to be an important area for the future. This has been a purposely high-level overview of some of the possibilities in cheminformatics and chemical biology that have been engendered by the development of deep learning methods in general and of generative methods in particular

Read more

Summary

Introduction

Much of chemical biology is involved with the study of the interactions between small molecules and biomacromolecules, along with any physiological consequences, usually with the aim of finding molecules that are in some senses ‘better’. In a more modern version, a target (or, much more occasionally a set of targets) is sought, on the basis of a hypothesis, usually about the desirability of inhibiting said target, and typically on a purified protein in vitro. The nominal advantage of the reverse approach is that in theory one immediately has a mechanism Even this is illusory, as effective drugs normally have multiple targets [2], and the ability to bind to a target in vitro conveys little or nothing about its mechanisms, efficacy or toxicity in vivo [3], nor even if it can even reach the supposed target(s) (membrane transporters are normally involved [3,4,5,6,7]). As with protein optimisation [8], it is arguably best seen as a navigation through a large search space of possible solutions [9]

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call