Abstract

Bond dissociation enthalpies (BDEs) of organic molecules play a fundamental role in determining chemical reactivity and selectivity. However, BDE computations at sufficiently high levels of quantum mechanical theory require substantial computing resources. In this paper, we develop a machine learning model capable of accurately predicting BDEs for organic molecules in a fraction of a second. We perform automated density functional theory (DFT) calculations at the M06-2X/def2-TZVP level of theory for 42,577 small organic molecules, resulting in 290,664 BDEs. A graph neural network trained on a subset of these results achieves a mean absolute error of 0.58 kcal mol−1 (vs DFT) for BDEs of unseen molecules. We further demonstrate the model on two applications: first, we rapidly and accurately predict major sites of hydrogen abstraction in the metabolism of drug-like molecules, and second, we determine the dominant molecular fragmentation pathways during soot formation.

Highlights

  • In order to ensure that the resulting machine learning (ML) method closely reproduced experimentally determined Bond dissociation enthalpies (BDEs), we performed a benchmark study of common density functional theory (DFT) and ab initio methods

  • Of the DFT methods, the choice of basis set appeared to have the greatest impact on accuracy, with the M06-2X/def2-TZVP combination coming very close to CCSD (T) accuracy

  • mean absolute error (MAE) of the three density functionals followed the order of B3LYP-D3 > ωB97XD > M06-2X for both basis sets

Read more

Summary

Results

Evaluation of QM methods for calculating homolytic BDEs. In order to ensure that the resulting ML method closely reproduced experimentally determined BDEs, we performed a benchmark study of common DFT and ab initio methods. To verify that ALFABET predictions are accurate for BDEs of drug molecules much larger than those used to construct the training set, DFT calculations performed for 82 top-selling drug molecules[54] These molecules ranged in size between 8 and 34 heavy atoms. The cross-validated predictive accuracy of the new model, based on ALFABET predictions, achieves a weighted least-squares loss less than half that of a recently developed group-contribution model on the same dataset (Fig. 8b)[56] These results demonstrate that AFLABET predictions can improve forward screening approaches in which bond energy is an important parameter. For the 91 molecules with YSI measurements and between 11 and 20 heavy atoms, DFT calculations were performed to confirm the predicted BDEs. The resulting prediction error was even lower than for the withheld test set predictions (Fig. 8c), demonstrating the ability of the model to scale to larger molecules

Discussion
Methods
Code availability
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call