Species density models from opportunistic citizen science data

Jay M Ver Hoef,Matt Higham,Robyn Angliss,Devin Johnson

doi:10.1111/2041-210x.13679

Abstract

Abstract With the advent of technology for data gathering and storage, opportunistic citizen science data are proliferating. Species distribution models (SDMs) aim to use species occurrence or abundance for ecological insights, prediction and management. We analysed a massive opportunistic dataset with over 100,000 records of incidental shipboard observations of marine mammals. Our overall goal was to create maps of species density from massive opportunistic data by using spatial regression for count data with an effort offset. We illustrate the method with two marine mammals in the Gulf of Alaska and Bering Sea. We counted the total number of animals in 11,424 hexagons based on presence‐only data. To decrease bias, we first estimated a spatial density surface for ship‐days, which was our proxy variable for effort. We used spatial considerations to create pseudo‐absences, and left some hexagons as missing values. Next, we created SDMs that used modelled effort to create pseudo‐absences, and included the effort surface as an offset in a second stage analysis of two example species, northern fur seals and Steller sea lions. For both effort and species counts, we used spatial count regression with random effects that had a multivariate normal distribution with a conditional autoregressive (CAR) covariance matrix, providing 2.5 million Markov chain Monte Carlo (MCMC) samples (1,000 were retained) from the posterior distribution. We used a novel MCMC scheme that maintained sparse precision matrices for observed and missing data when batch sampling from the multivariate normal distribution. We also used a truncated normal distribution to stabilize estimates, and used a look‐up table for sampling the autocorrelation parameter. These innovations allowed us to draw several million samples in just a few hours. From the posterior distributions of the SDMs, we computed two functions of interest. We normalized the SDMs and then applied an overall abundance estimate obtained from the literature to derive spatially explicit abundance estimates, especially within subsetted areas. We also created ‘certain hotspots’ that scaled local abundance by standard deviation and using thresholds. Hexagons with values above a threshold were deemed as hotspots with enough evidence to be certain about them.

Full Text