Abstract

The use of numerous descriptors that are indicative of molecular structure and topology is becoming more common in quantitative structure-activity relationship (QSAR). How to choose the adequate descriptors for QSAR studies is important but difficult because there are no absolute rules to govern this choice. A variety of variable selection techniques including stepwise, partial least squares/principal component analysis (PLS/PCA), neural network, and evolutionary algorithm such as genetic algorithm have been applied to this common problem. All-subsets regression (ASR) is capable of finding out the best variable subset from among a large pool. In this paper, a novel variable selection and modeling method based on the prediction, for short VSMP, has been developed. Here two controllable parameters, the interrelation coefficient between the pairs of the independent variables (r(int)) and the correlation coefficient (q(2)) obtained using the leave-one-out (LOO) cross-validation technique, are introduced into the ASR to improve its performances. This technique differs from the other variable selection procedures related to the ASR by two main features: (1) The search of various optimal subset search is controlled by the statistic q(2) or root-mean-square error (RMSEP) in the LOO cross-validation step rather than the correlation coefficient obtained in the modeling step (r(2)). (2) The searching speed of all optimal subsets is expedited by the statistic r(int) together with q(2). A comparison of the results of the VSMP applied to the Selwood data set (n = 31 compounds, m = 53 descriptors) with those obtained from alternative algorithms shows the good performance of the technique.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.