Abstract

Multiple regression analysis continues to be a quantitative tool used extensively in the ecological literature. Consequently, methods for model selection and validation are important considerations, yet ecologists appear to pay little attention to how the choice of method can potentially influence the outcome and interpretation of their results. In this study we review commonly employed model selection and validation methods and use a Monte Carlo simulation approach to evaluate their ability to accurately estimate variable inclusion in the final regression model and model prediction error. We found that all methods of model selection erroneously excluded or included variables in the final model and the error rate depended on sample size and the number of predictor variables. In general, forward selection, backward elimination and stepwise selection showed better performance with small sample sizes, whereas a modified bootstrap approach outperformed other methods with larger sample sizes. Model selection using all-subsets or exhaustive search was highly biased, at times never selecting the correct predictor variables. Methods for model validation were also highly biased, with resubstitution and data-splitting (i.e., dividing the data into training and test samples) techniques producing biased and variable estimates of model prediction error. In contrast, jackknife validation was generally unbiased. Using an empirical example we show that the interpretation of the ecological relationships between fish species richness and lake habitat is highly dependent on the type of model selection and validation method employed. The fact that model selection is frequently unsuited to determine correct ecological relationships, and that traditional approaches for model validation over-estimate the strength and value of our empirical models, is a major concern.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call