Molecular optimization by capturing chemist\u2019s intuition using deep neural networks

Jiazhen He,Ola Engkvist,Emil Sandström,Esben Jannik Bjerrum,Werngard Czechtizky,Eva Nittinger,Huifang You,Christian Tyrchan

doi:10.1186/s13321-021-00497-0

Jiazhen He, Ola Engkvist + Show 6 more

Open Access

https://doi.org/10.1186/s13321-021-00497-0

Copy DOI

Abstract

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist’s intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

Highlights

A main challenge in drug discovery is finding molecules with desirable properties
The simplified molecular-input line-entry system (SMILES) representation of molecules [40], as a string-based representation, is used in our study to facilitate the use of machine translation models from natural language processing (NLP)
In order to translate source molecules into target molecules with customized properties, the encoded property changes are concatenated with the SMILES representation of starting molecules as input sequences for machine translation models, while the target sequences are the SMILES representation of target molecules

Summary

Introduction

A drug requires a balance of multiple properties, e.g. physicochemical properties, ADMET (absorption, distribution, metabolism, elimination and toxicity) properties, safety and potency against its target To find such a drug in the extremely large chemical space (i.e. 1023 − 1060 ) [1] is challenging. He et al J Cheminform (2021) 13:26 suggest transformations to improve the promising molecule They are not generally true, and become more problematic and difficult to apply when optimizing multiple properties simultaneously. Conditional generative models [15, 18, 29, 30] have been developed where the desirable properties are incorporated as condition to directly control the generating process Another approach is to use reinforcement learning to modify a molecule directly based on molecular graph representation [31, 32].

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Mar 20, 2021
Citations: 56	License type: open-access

R Discovery Prime

R Discovery Prime

Molecular optimization by capturing chemist\u2019s intuition using deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Retracted] Design of Intelligent Recognition English Translation Model Based on Deep Learning
Yuexiang Ruan
Journal of Mathematics | VOL. 2022
Yuexiang RuanYuexiang Ruan
01 Jan 2021
Journal of Mathematics | VOL. 2022

Integration of Speech Recognition and Machine Translation in Computer-Assisted Translation
Shahram Khadivi ... Hermann Ney
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 16
Shahram Khadivi, et. al.Shahram Khadivi ... Hermann Ney
01 Nov 2008
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 16

NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning
Rongshu Wang ... Jianhua Chen
BMC Genomics | VOL. 25
Rongshu Wang, et. al.Rongshu Wang ... Jianhua Chen
07 Jun 2024
BMC Genomics | VOL. 25

Neural Machine Translation for English-Assamese Language Pair using Transformer
Rudra Dutt ... Akshat Srivastava
-
Rudra Dutt, et. al.Rudra Dutt ... Akshat Srivastava
07 Oct 2022
07 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Molecular optimization by capturing chemist\u2019s intuition using deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics