Abstract
We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. The proposed group contribution method uses atom-centered functional groups with corrections for ring and polycyclic strain while the machine learning models adopt a directed message passing neural network. The solute parameters predicted from SoluteGC and SoluteML are used to calculate solvation energy and enthalpy via linear free energy relationships. Extensive data sets containing 8366 solute parameters, 20,253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. The three models are each evaluated on the same test sets using both random and substructure-based solute splits for solvation energy and enthalpy predictions. The results show that the DirectML model is superior to the SoluteML and SoluteGC models for both predictions and can provide accuracy comparable to that of advanced quantum chemistry methods. Yet, even though the DirectML model performs better in general, all three models are useful for various purposes. Uncertain predicted values can be identified by comparing the three models, and when the 3 models are combined together, they can provide even more accurate predictions than any one of them individually. Finally, we present our compiled solute parameter, solvation energy, and solvation enthalpy databases (SoluteDB, dGsolvDBx, dHsolvDB) and provide public access to our final prediction models through a simple web-based tool, software packages, and source code.
Highlights
Information on solvation free energy aids in the selection of viable solvents in chemical processes such as the synthesis of organic molecules,[1,2] optimization of purification processes,[3] and pollutant level management.[4]
We evaluate the performance of the three models (SoluteGC, SoluteML, and DirectML) on 10 % test sets for both random and substructure-based solute splits
For the comparison of the three models, the test solvents are limited to those with Abraham or Mintz solvent parameters since the SoluteGC and SoluteML models can be evaluated on only those solvents
Summary
Information on solvation free energy aids in the selection of viable solvents in chemical processes such as the synthesis of organic molecules,[1,2] optimization of purification processes,[3] and pollutant level management.[4] The solvation Gibbs free energy (∆Gsolv) of a solute in a solvent is directly related to that solute’s partition coefficient between the gas and solvent phase. This property is typically reported at room temperature and can be a valuable feature for the prediction of the solute’s liquid-liquid partition coefficient and solid solubility in organic solvents. We aim to provide improved predictions of ∆Gsolv(298 K) and ∆Hsolv(298 K), which can be used for the calculation of ∆Gsolv at elevated temperatures, an easy-access tool for our predictive models, and new databases for ∆Gsolv(298 K) and ∆Hsolv(298 K)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.