Abstract

We present machine learning models for predicting experimental hydration free energies of molecules without any atom-, bond-, or geometry-specific input feature. Four types of physically inspired descriptors are adopted for predictions. The first type is composed of the total dipole moment, anisotropic polarizability, and vibrational analysis results of the solute molecule. The second and third types are derived from the electrostatic potential distribution of the solute. The last type includes the solvent accessible surface area and shape similarities. Several machine learning regression models are built on the basis of the FreeSolv database with ∼600 samples, showing a better performance in comparison with that of most traditional approaches and other prediction methods based on molecular fingerprints. In particular, the present descriptors are capable of predicting hydration free energies of new compounds with elements or fragments that are never seen in the training set. The importance of these descriptors, the impact of dissociation energies of specific covalent bonds, and the outliers with relatively large prediction errors are also discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call