Abstract

In general, linear modelling techniques such as multiple linear regression (MLR), principal component regression (PCR) and partial least squares (PLS), are used to model QSAR data. This type of data can be very complex and linear modelling techniques often model only a limited part of the information captured in the data. In this study, it was tried to combine linear techniques with the flexible non-linear technique multivariate adaptive regression splines (MARS). Models were built using an MLR model, combined with either a stepwise procedure or a genetic algorithm for variable selection, a PCR model or a PLS model as starting points for the MARS algorithm. The descriptive and predictive power of the models was evaluated in a QSAR context and compared to the performances of the individual linear models and the single MARS model. In general, the combined methods resulted in significant improvements compared to the linear models and can be considered valuable techniques in modelling complex QSAR data. For the used data set the best model was obtained using a combination of PLS and MARS. This combination resulted in a model with a Pearson correlation coefficient of 0.90 and a cross-validation error, evaluated with 10-fold cross-validation of 9.9%, pointing at good descriptive and high predictive properties.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.