Abstract

Machine learning techniques are increasingly being employed to predict molecular properties. Such models are often trained on large computationally derived datasets, and are only as accurate as the underlying data. We exploit the well-known error cancelling effect of isodesmic and homodesmotic reactions to develop a multi-fidelity data-driven molecular property prediction method. First, we propose an optimization-based scheme to quickly and automatically identify all isofragmented reactions for a target molecule u, i.e., balanced reactions involving u and one or more molecules from a given set of molecules (M), which conserve a predefined set of fragments of arbitrary size. Second, we show that such isofragmented reactions can be leveraged to improve the predictive accuracy of a data-driven model by infusing a small high-accuracy dataset comprising molecules in M. We applied this method with a high-accuracy subset of the NIST thermochemistry database and a simple additive data-driven model trained on a QM9 subset. Our results show that the heats of formation using our method were ∼4.4 kcal/mol more accurate on average than the data-driven model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.