A simple QSPR model, based on seven 1D and 2D descriptors and artificial neural network, was developed for fast evaluation of aqueous solubility. The model was able to predict the molar solubility of a diverse set of 1312 organic compounds with an overall correlation coefficient of 0.92 and a standard deviation of 0.72 log unit between the calculated and experimental data. Considering the fact that the estimated uncertainty of the experimental data is no less than 0.5 log unit, the results demonstrate that carefully chosen physically meaningful 1D and 2D descriptors encode sufficient molecular information for fast and reasonably reliable prediction of aqueous solubility with a simple neural network. As a comparison, we calculated the solubility of a test set of 258 compounds, ranging from simple hydrocarbons to more complex multifunctional organic molecules, with a commercial program (QMPR+ version 2.0.1 of SimulationPlus Inc.) and compared the results with predictions from our model. Statistical parameters indicate that for small and simple organic compounds, QMPR+ outperforms our model. However for more complex multifunctional molecules, our model is superior.
Read full abstract