Abstract

Several non-parametric regressors have been applied to modelling quantitative structure–activity relationship (QSAR) data. Performances were benchmarked against multilinear regression and the nonlinear method of smoothing splines. Variable selection was explored through systematic combinations of different variables and combinations of principal components. For the training set examined—539 inhibitors of the tyrosine kinase, Syk—the best two-descriptor model had a 5-fold cross-validated q2 of 0.43. This was generated by a multi-variate Nadaraya–Watson kernel estimator. A subsequent, independent, test set of 371 similar chemical entities showed the model had some predictive power. Other approaches did not perform as well. A modest increase in predictive ability can be achieved with three descriptors, but the resulting model is less easy to visualise. We conclude that non-parametric regression offers a potentially powerful approach to identifying predictive, low-dimensional QSARs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call