Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.

Renren Bai,Chengyun Zhang,Ling Wang,Hongliang Duan,Chuansheng Yao,Jiamin Ge

doi:10.3390/molecules25102357

Abstract

Effective computational prediction of complex or novel molecule syntheses can greatly help organic and medicinal chemistry. Retrosynthetic analysis is a method employed by chemists to predict synthetic routes to target compounds. The target compounds are incrementally converted into simpler compounds until the starting compounds are commercially available. However, predictions based on small chemical datasets often result in low accuracy due to an insufficient number of samples. To address this limitation, we introduced transfer learning to retrosynthetic analysis. Transfer learning is a machine learning approach that trains a model on one task and then applies the model to a related but different task; this approach can be used to solve the limitation of few data. The unclassified USPTO-380K large dataset was first applied to models for pretraining so that they gain a basic theoretical knowledge of chemistry, such as the chirality of compounds, reaction types and the SMILES form of chemical structure of compounds. The USPTO-380K and the USPTO-50K (which was also used by Liu et al.) were originally derived from Lowe’s patent mining work. Liu et al. further processed these data and divided the reaction examples into 10 categories, but we did not. Subsequently, the acquired skills were transferred to be used on the classified USPTO-50K small dataset for continuous training and retrosynthetic reaction tests, and the pretrained accuracy data were simultaneously compared with the accuracy of results from models without pretraining. The transfer learning concept was combined with the sequence-to-sequence (seq2seq) or Transformer model for prediction and verification. The seq2seq and Transformer models, both of which are based on an encoder-decoder architecture, were originally constructed for language translation missions. The two algorithms translate SMILES form of structures of reactants to SMILES form of products, also taking into account other relevant chemical information (chirality, reaction types and conditions). The results demonstrated that the accuracy of the retrosynthetic analysis by the seq2seq and Transformer models after pretraining was significantly improved. The top-1 accuracy (which is the accuracy rate of the first prediction matching the actual result) of the Transformer-transfer-learning model increased from 52.4% to 60.7% with greatly improved prediction power. The model’s top-20 prediction accuracy (which is the accuracy rate of the top 20 categories containing actual results) was 88.9%, which represents fairly good prediction in retrosynthetic analysis. In summary, this study proves that transferring learning between models working with different chemical datasets is feasible. The introduction of transfer learning to a model significantly improved prediction accuracy and, especially, assisted in small dataset based reaction prediction and retrosynthetic analysis.

Highlights

Organic synthesis is a crucial discipline that predicts accesses to molecules
The results demonstrated that the accuracy of the retrosynthetic analysis by the seq2seq and Transformer models after pretraining was significantly improved
The encoder addresses the input sequence and exports the corresponding context vector to the decoder. The decoder applies this representation to pass a set of predictions. These two recurrent neural networks (RNNs) consist of long short-term memory (LSTM) cells, which efficiently dispose of long-range relations in sequences [20]

Summary

Introduction

Organic synthesis is a crucial discipline that predicts accesses to molecules. Two closely. Kim seq2seq models to training chemicalonreaction predictions and have regardedreactions, prediction tasks firstlanguage introduced the concept of treating chemical reactions as a translation problem [6]. Schwaller and Lee’s group successfully applied a Molecular Transformer model to uncertainty-calibrated chemical reaction prediction [9]. As a kind of AI technology, transfer learning can be applied to organic and medicinal chemistry, especially reaction prediction and retrosynthetic analysis based on datasets containing very limited data volume. When the deep learning method is applied to retrosynthetic analysis or predictions of the products of these reactions, it is difficult to obtain accurate prediction results because the dataset is too limited to adequately train the model. In this work (Figure 2), to increase the accuracy of retrosynthetic analysis, we introduced the transfer learning strategy to the seq2seq and Transformer models.

Dataset Preparation

Pretraining Dataset Preparation

Seq2seq Model

Transformer Model

Performance Evaluation

Heterocycle

Comparisons

Comparisons and representative examples of of the Transformer andformation

Conclusion

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecules	Publication Date: May 19, 2020
Citations: 22	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecules

Lead the way for us

Similar Papers

Transfer learning-based fault location with small datasets in VSC-HVDC
Boyang Shang ... Jiaxin Hei
International Journal of Electrical Power & Energy Systems | VOL. 151
Boyang Shang, et. al.Boyang Shang ... Jiaxin Hei
13 Apr 2023
International Journal of Electrical Power & Energy Systems | VOL. 151

A hybrid approach based on transfer and ensemble learning for improving performances of deep learning models on small datasets
Tunç Gültekin ... Aybars Uğur
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES | VOL. 29
Tunç Gültekin, et. al.Tunç Gültekin ... Aybars Uğur
30 Nov 2021
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES | VOL. 29

Deep autoencoder based domain adaptation for transfer learning.
Krishna Dev ... Sandeep Kumar
Multimedia tools and applications | VOL. 81
Krishna Dev, et. al.Krishna Dev ... Sandeep Kumar
16 Mar 2022
Multimedia tools and applications | VOL. 81

Transfer learning for molecular property predictions from small datasets
Thorren Kirschbaum ... Annika Bande
AIP Advances | VOL. 14
Thorren Kirschbaum, et. al.Thorren Kirschbaum ... Annika Bande
01 Oct 2024
AIP Advances | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecules