Reformulating Reactivity Design for Data-Efficient Machine Learning.

Toby Lewis-Atwell,Daniel Beechey,Özgür Şimşek,Matthew N Grayson

doi:10.1021/acscatal.3c02513

Abstract

Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and SN2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol-1 of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACS catalysis	Publication Date: Oct 6, 2023
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Reformulating Reactivity Design for Data-Efficient Machine Learning.

Abstract

Talk to us

Similar Papers

More From: ACS catalysis

Lead the way for us

Similar Papers

Competition between Elimination and Substitution for Ambident Nucleophiles CN- and Iodoethane Reactions in Gaseous and Aqueous Medium.
Xu Liu ... Wenyu Guo
The Journal of Physical Chemistry A | VOL. 127
Xu Liu, et. al.Xu Liu ... Wenyu Guo
28 Aug 2023
The Journal of Physical Chemistry A | VOL. 127

Theory of the Kinetics of Chemical Potentials in Heterogeneous Catalysis
Jun Cheng ... P Hu
Angewandte Chemie International Edition | VOL. 50
Jun Cheng, et. al.Jun Cheng ... P Hu
29 Jun 2011
Angewandte Chemie International Edition | VOL. 50

Chemist versus Machine: Traditional Knowledge versus Machine Learning Techniques
Janine George ... Geoffroy Hautier
Trends in Chemistry | VOL. 3
Janine George, et. al.Janine George ... Geoffroy Hautier
09 Nov 2020
Trends in Chemistry | VOL. 3

How Alkyl Halide Structure Affects E2 and SN2 Reaction Barriers: E2 Reactions Are as Sensitive as SN2 Reactions
Paul R Rablen ... Brandon J Karlow
The Journal of Organic Chemistry | VOL. 79
Paul R Rablen, et. al.Paul R Rablen ... Brandon J Karlow
17 Jan 2014
How Alkyl Halide Structure Affects E2 and SN2 Reaction Barriers: E2 Reactions Are as Sensitive as SN2 Reactions
Paul R Rablen ... Brandon J Karlow

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reformulating Reactivity Design for Data-Efficient Machine Learning.

Abstract

Talk to us

Similar Papers

More From: ACS catalysis