Abstract

In the present study, geometrical structures were constructed and optimized for 122 nonionic organic compounds at the quantum–mechanical HF/6-31G * level of theory. The electrostatic potentials and subsequent structural descriptors derived from them were obtained. Gaussian process, and for comparison purpose, multiple linear regression (MLR) and support vector machine (SVM), were then employed to build the quantitative structure-bioconcentration factor relationships. Systematical validations including internal leave-one-out cross-validation, the validation for external test set, as well as a more rigorous Monte Carlo cross-validation were made to confirm the reliability of the constructed models. It has been found that the quantities derived from electrostatic potential, V min and ∑ V s , ind - , together with the molecular volume ( V mc ), dipole moment ( μ) and the energy level of highest occupied molecular orbital ( E HOMO ) can be well used to express the quantitative structure–property relationship of this sample set. Both linear and nonlinear models can give satisfactory results, and the GP, which be capable of handing with linear and nonlinear-hybrid relationship through a mixed covariance function, appears to have better fitting and predictive abilities than other two statistical methods. The coefficient of determination r pred 2 and root mean square error of prediction (RMSEP) for the external test set are 0.953 and 0.337, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call