Abstract

After generation of quantum chemical descriptors calculated with the GEDIIS/GDIIS optimizer in Gaussian 09, five quantum chemical descriptors related to molecular polarizability, atomic charges, charge product between solvent and solute, were selected as the optimal descriptor subset for developing quantitative structure–property relationship (QSPR) models of 7215 enthalpies of solvation and vaporization. The random forest (RF) algorithm was used to develop the RF Model Ⅰ whose dataset division for training and testing was mainly based on the solvent types. The RF Model Ⅰ has the number (n) of enthalpies of solvation and vaporization being 3633, coefficient of determination R2 being 0.986, root mean square (rms) error being 2.598 kJ/mol for the training set and n = 3582, R2 = 0.933, rms = 5.501 kJ/mol for the test set. The RF Model Ⅵ based on Kennard-Stone algorithm for dataset selection possesses n = 4810, R2 = 0.987, rms = 2.550 kJ/mol (training set), n = 2405, R2 = 0.940, rms = 4.659 kJ/mol (test set). These statistical results are very accurate, compared with other QSPR models on enthalpies of solvation reported in the literature. Furthermore, the RF Model Ⅰ and Ⅵ based on large data sets can be used for predicting both of the solvation enthalpies and vaporization enthalpies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call