Abstract

AbstractThe categorization of multidimensional data into clusters is a common task in statistics. Many applications of clustering, including the majority of tasks in ecology, use data that is inherently spatial and is often also temporal. However, spatiotemporal dependence is typically ignored when clustering multivariate data. We present a finite mixture model for spatial and spatiotemporal clustering that incorporates spatial and spatiotemporal autocorrelation by including appropriate Gaussian processes (GP) into a model for the mixing proportions. We also allow for flexible and semiparametric dependence on environmental covariates, once again using GPs. We propose to use Bayesian inference through three tiers of approximate methods: a Laplace approximation that allows efficient analysis of large datasets, and both partial and full Markov chain Monte Carlo (MCMC) approaches that improve accuracy at the cost of increased computational time. Comparison of the methods shows that the Laplace approximation is a useful alternative to the MCMC methods. A decadal analysis of 253 species of teleost fish from 854 samples collected along the biodiverse northwestern continental shelf of Australia between 1986 and 1997 shows the added clarity provided by accounting for spatial autocorrelation. For these data, the temporal dependence is comparatively small, which is an important finding given the changing human pressures over this time.

Highlights

  • Identifying regions of relative homogeneity in data is a common goal in most, and probably all, data-driven disciplines

  • To address the complexities introduced by the inclusion of spatial and spatiotemporal dependence, we introduce novel methods to conduct approximate Bayesian inference that scale well with both the number of samples and the dimensionality of those observations

  • We model the functions of covariates with additive, mutually independent, Gaussian processes (GP) hk(x)|θh,k,1, ... , θh,k,D ∼ GP 0, ch,k,d(xd, xd′ |θh,k,d), (5)

Read more

Summary

INTRODUCTION

Identifying regions of relative homogeneity in data is a common goal in most, and probably all, data-driven disciplines. Spatial and temporal dependence may arise among biological observations but most cluster analyses ignores this possibility In doing so, these studies inadvertently ignore the potential for spatiotemporal correlation to be confused with ecological groups. Unlike the previously introduced models, our approach allows for covariates and for spatiotemporal autocorrelation within the data This is achieved in a single analysis, which avoids the problems of propagating uncertainty through multiple stages of an analysis. We analyze 854 samples of 253 teleost fish on the NWS of Australia (see Figure 1) to test our methods in large real-world data and illustrate the effects of including spatial and spatiotemporal effects by fitting models with and without them. The temporal component may be important for this region that has been subject to differing exploitation rates of fish as well as different resource management paradigms (Considine, 1985; Sainsbury et al, 1993)

NWS region
Biological data
Physical environment data
Spatiotemporal clustering model
Priors
Inferential methods
Identifiability of the parameters and inferring covariate effects
Model comparison via cross-validation
Tests with simulated data
Posterior inference and model comparison
The effect of covariates and the spatial term
Spatiotemporal analysis of the NWS data
SUMMARY AND DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call