Abstract

In general, linear modelling techniques such as multiple linear regression (MLR), principal component regression (PCR) and partial least squares (PLS), are used to model QSAR data. This type of data can be very complex and linear modelling techniques often model only a limited part of the information captured in the data. In this study, it was tried to combine linear techniques with the flexible non-linear technique multivariate adaptive regression splines (MARS). Models were built using an MLR model, combined with either a stepwise procedure or a genetic algorithm for variable selection, a PCR model or a PLS model as starting points for the MARS algorithm. The descriptive and predictive power of the models was evaluated in a QSAR context and compared to the performances of the individual linear models and the single MARS model. In general, the combined methods resulted in significant improvements compared to the linear models and can be considered valuable techniques in modelling complex QSAR data. For the used data set the best model was obtained using a combination of PLS and MARS. This combination resulted in a model with a Pearson correlation coefficient of 0.90 and a cross-validation error, evaluated with 10-fold cross-validation of 9.9%, pointing at good descriptive and high predictive properties.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call