Abstract

The “creativity” of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that “creative” AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed “SMILES/CGR” strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

Highlights

  • Generative models based on recurrent deep neural networks were successfully used to generate novel chemical ­structures[28,29,30,31,32,33,34,35,36,37]

  • We showed that in silico chemical reaction handling can be significantly simplified by the Condensed Graph of Reaction (CGR) a­ pproach[38], in which the structures of reactants and products are merged into a single r graph (Fig. 1)

  • The reconstruction rate was 98.4% and 97.8% at the training and validation stage, respectively. This is slightly less than reconstruction rates of plain molecular SMILES by stateof-the-art encoders/decoders, but it can be explained by larger complexity and length of SMILES/CGR and an additional source of error: the errors of atom-to-atom mapping in some entries

Read more

Summary

Introduction

Generative models based on recurrent deep neural networks were successfully used to generate novel chemical ­structures[28,29,30,31,32,33,34,35,36,37]. The CGR edges correspond either to standard chemical bonds or to “dynamic” bonds describing transformations In such a way, one can consider a CGR as a pseudomolecule for which some types of molecular descriptors can be computed followed by their application in data analysis and statistical modeling t­asks[39]. One can consider a CGR as a pseudomolecule for which some types of molecular descriptors can be computed followed by their application in data analysis and statistical modeling t­asks[39] This approach was successfully applied to similarity searching in reaction d­ atabases[38,40], building quantitative structure–reactivity m­ odels[41,42,43,44], assessment of tautomer d­ istributions[45,46], prediction of activity ­cliffs[47], classification of enzymatic t­ransformations[48], prediction of reaction c­ onditions[49,50], etc. Notice that visualization is not strictly required for clusters identification, but may significantly help to choose a cluster from which the sampling is performed

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call