Abstract
This study has assessed the use of consensus regression, as compared to single multiple linear regression, models for the development of quantitative structure-activity relationships (QSARs). To provide a comparison, four data sets of varying size and complexity were analyzed: silastic membrane flux, toxicity of phenols to Tetrahymena pyriformis, acute toxicity to the fathead minnow and flash point. For each data set, a genetic algorithm was used to develop a model population and the performance of consensus models was compared to that of the best single model. Two consensus models were developed, one using the top 10 models, and the other using a subset of models chosen to provide maximal coverage of model space. The results highlight the ability of the genetic algorithm to develop predictive models from a large descriptor pool. However, the consensus models were shown to offer no significant improvements over single regression models, which are as statistically robust as the equivalent consensus models. Consensus models developed from a selection of the best QSARs were shown not to be superior to a selection of diverse in "model space" QSARs. For the data sets analyzed in this study, and in light of the Organization for Economic Cooperation and Development principles for the validation of QSARs, the increase in model complexity when using consensus models does not seem warranted given the minimal improvement in model statistics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.