Abstract

Quantitative structure-activity relationship (QSAR) modelers often encounter the problem of multicollinearity owing to the availability of large numbers of computable molecular descriptors. Sparsity of the variables while using descriptors such as atom pairs increases the complexity. Three different predictor-thinning methods, namely, a modified Gram-Schmidt algorithm, a marginal soft thresholding algorithm, and LASSO (least absolute shrinkage and selection operator), were utilized to reduce the number of descriptors prior to developing linear models. Juvenile hormone (JH) activity of 304 compounds on Culex pipiens larvae was taken as the model data set, and predictor trimming of a large number of diverse descriptors comprising 268 global molecular descriptors (topostructural, topochemical, and geometrical), 13 quantum chemical descriptors, and 915 atom pairs (substructural counts) was applied prior to linear regression by the ridge regression method. The data set (N = 304) was split into five calibration data sets of random samples of sizes 60/110/160/210/260, and the remaining 244/194/144/94/44 compounds were used for validations. LASSO was not found to be a very effective method in handling a large set of descriptors because the number of predictors retained could not exceed the number of observations. The results indicated that the modified Gram-Schmidt algorithm could be used to trim the number of predictors in the global molecular descriptor set where collinearity of the descriptors was the major concern. On the contrary, the soft thresholding approach was found to be an effective tool in subset selection from a diverse set of descriptors having both sparsity and multicollinearity, as in the case of the combined set of atom pairs and global molecular descriptors. The final model developed after variable selection was dominated more by atom pairs, which indicated the important structural moieties that affect JH activity of the compounds. The success of the method reiterates the fact that QSAR or quantitative structure-property relationship (QSPR) models can be developed for a diverse set of compounds using properly parametrized and diverse sets of descriptors, of course, with the selection of the appropriate statistical tools.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.