Abstract

Regional regression models can be employed to estimate hydrologic statistics at ungauged river sites. The increased availability of watershed characteristics and the use of GIS-based methods to process this data have created situations where a large number of highly correlated watershed characteristics are available as potential model explanatory variables. The use of ordinary least squares (OLS) regression procedures with highly correlated variables can produce multicollinearity, creating highly sensitive parameter estimators with inflated variances, and improper model selection. A Monte Carlo simulation is developed to compare four techniques for handling multicollinearity: OLS, OLS with Variance Inflation Factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). Results show the impact of multicollinearity is magnified at smaller samples sizes, higher correlations, and larger model error variances. Although PCR and PLS yield parameter estimators with reduced variances, there is no improvement in model prediction compared to OLS and VIF. Using VIF for screening variables produced small improvements in model predictions. The use of OLS appears warranted as the complexity of using biased regression techniques to address multicollinearity does little to improve model predictions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call