Abstract

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

Highlights

  • Developments in computer assisted synthesis planning (CASP), speci cally retrosynthetic analysis have gained considerable interest in recent years.[1]

  • We investigate the role of the template prioritization method and the tree search algorithm derived from the work of Segler and Waller.[3]

  • This has since been corrected by Coley et al in RDChiral and has been extended in this study to encompass ca. 75 functional and protecting groups commonly used in organic synthesis.[32]

Read more

Summary

Introduction

Developments in computer assisted synthesis planning (CASP), speci cally retrosynthetic analysis have gained considerable interest in recent years.[1]. Retrosynthetic planning or analysis refers to the technique used by chemists to recursively deconstruct a compound into its we investigate the role of the template prioritization method and the tree search algorithm derived from the work of Segler and Waller.[3] Template prioritization is framed as a multiclass classi cation problem, for which we employ a neural network which outputs the probability of applying any given template, referred to as the policy network This constitutes the machine learning (ML) part of the process, which we couple to a search strategy and decision-making process in the form of a tree search. We examine this model in the context of the underlying datasets, pooling from internal

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.