Abstract

Computer-aided synthesis has received much attention in recent years. It is a challenging topic in itself, due to the high dimensionality of chemical and reaction space. It becomes even more challenging when the aim is to suggest syntheses that can be performed in continuous flow. Though continuous flow offers many potential benefits, not all reactions are suited to be operated continuously. In this work, three machine learning models have been developed to provide an assessment of whether a given reaction may benefit from continuous operation, what the likelihood of success in continuous flow is for a certain set of reaction components (i.e. reactants, reagents, solvents, catalysts, and products) and, if the likelihood of success is low, which alternative reaction components can be considered. The first model uses an abstract version of a reaction template, obtained via gaussian mixture modelling, to quantify its relative increase in publishing frequency in continuous flow, without relying on potentially ambiguously defined reaction templates. The second model is an artificial neural network that categorizes feasible and infeasible reaction components with a 75 % success rate. A set of reaction components is considered to be feasible if there is an explicit reference to it being used in continuous synthesis in the database; all other reaction components are considered infeasible. While several cases that are ‘infeasible’ by this definition, are classified as feasible by the neural network, further analysis shows that for many of these cases, it is at least plausible that they are in fact feasible – they simply have not been tested to (dis)prove this. The final model suggests alternative continuous flow components with a top-1 accuracy of 95%. Combined, they offer a black-box evaluation of whether a reaction and a set of reaction components can be considered promising for continuous syntheses.

Highlights

  • The development of new active pharmaceutical ingredients (APIs) is a time-consuming and expensive process (DiMasi et al, 1991, 2003), with up to half of the total cost being spent in the pre-clinical phase (Adams and Van Brantner, 2006)

  • The search for syntheses for target molecules was formalized as retrosynthetic analysis in the 1960’s (Corey, 1967; Corey and Wipke, 1969), and ever since, attempts have been made to automate it through computer-aided synthesis planning (CASP)

  • This was done via a rule-based approach, using a predefined set of chemical reaction templates, which were iteratively applied to the target molecules and its subsequent precursors (Law et al, 2009; Christ et al, 2012; Bøgevig et al, 2015)

Read more

Summary

INTRODUCTION

The development of new active pharmaceutical ingredients (APIs) is a time-consuming and expensive process (DiMasi et al, 1991, 2003), with up to half of the total cost being spent in the pre-clinical phase (Adams and Van Brantner, 2006). Following previouslyreported heuristic extraction procedures (Law et al, 2009; Bøgevig et al, 2015; Coley et al, 2019a), 2,586 reaction templates are identified from the FRD, in contrast to 2.9 million reaction templates from the entire Reaxys database, of which 366,000 are represented by five or more different reaction examples This indicates that the scientific community has been focusing on rather specific types of chemistry when continuous synthesis. The above analysis of the publication frequency of chemical reaction templates, solvents and reagents in continuous organic synthesis has shown that there are strong preferences toward specific types of chemistry and chemicals for continuous operation In other words, both the FRD and FCD datasets are strongly biased. The meaning of the scores is similar to that of the category scores – 0 indicates that the reaction has no benefit from flow based on published data, whereas values close to 1 indicate the opposite

Results
CONCLUSIONS
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.