Abstract
Scientists need appropriate spatial‐statistical models to account for the unique features of stream network data. Recent advances provide a growing methodological toolbox for modelling these data, but general‐purpose statistical software has only recently emerged, with little information about when to use different approaches. We implemented a simulation study to evaluate and validate geostatistical models that use continuous distances, and penalised spline models that use a finite discrete approximation for stream networks. Data were simulated from the geostatistical model, with performance measured by empirical prediction and fixed effects estimation. We found that both models were comparable in terms of squared error, with a slight advantage for the geostatistical models. Generally, both methods were unbiased and had valid confidence intervals. The most marked differences were found for confidence intervals on fixed‐effect parameter estimates, where, for small sample sizes, the spline models underestimated variance. However, the penalised spline models were always more computationally efficient, which may be important for real‐time prediction and estimation. Thus, decisions about which method to use must be influenced by the size and format of the data set, in addition to the characteristics of the environmental process and the modelling goals. ©2015 The Authors. Environmetrics published by John Wiley & Sons, Ltd.
Highlights
Large data sets collected on streams and rivers are becoming more common because of broad-scale environmental-monitoring programs
Similar patterns were visible across the different simulated spatial structures; the lowest root-mean-squared prediction error (RMSPE) was associated with data exhibiting a long spatial range (r D 0:30) and a weak linear component (k1 D 0:1), while the highest RMSPE was associated with short spatial range and a dominant linear component
Two different spatial statistical approaches used to model stream network data were compared across a wide variety of simulated data
Summary
Large data sets collected on streams and rivers are becoming more common because of broad-scale environmental-monitoring programs. These data sets often include measurements such as dissolved pollutant concentrations, stream temperature and measures of biodiversity (i.e. counts of birds and insects) which are collected across the branching stream network. These data are often used to address vital questions pertaining to the effects of climate change on habitat and species distributions, as well as other anthropogenic impacts on instream habitat and aquatic pollution. This issue is remedied by including an additional spatial process in the model specification, whose covariance matrix is populated by some appropriate function of the Euclidean separation between pairs of observations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.