Species distribution atlases often rely on volunteer effort to achieve their desired coverage, an activity now typically discussed, at least in academia, under the general theme of “citizen science”. Such data, however, are rarely without complex biases, particularly with respect to the estimation of trends in species’ distributions over many decades. The data of the Botanical Society of Britain and Ireland (BSBI) are no exception to this, and both careful thought in data aggregation (spatial, temporal, and taxonomic) and appropriate modelling procedures are required to overcome these challenges. We discuss these issues, with a primary focus on the statistical models that have been put forward to adjust for such biases. Such models include the Telfer method, various “reporting rate” approaches based on generalised linear models, the frequency scaling using local occupancy (“Frescalo”) model, occupancy models, and spatial smoothing methods. In each case the strengths and limitations in relation to estimating trends from distribution data with important time-varying biases are assessed. Various properties of BSBI data, in particular the increasing numbers of records at fine spatial and temporal scales over the past century, coupled with a general lack of re-visits to sites at such finer scales and the time-varying biases previously mentioned, imply that methods that can be sensibly applied at coarser levels are likely to be most appropriate for estimating accurate long-term trends in distributions. We conclude that Frescalo, which can be seen as a type of occupancy model where an adjustment for overlooked species is made in relation to spatial rather than temporal replication, whilst simultaneously adjusting for variable regional effort, is currently the most sophisticated tool for achieving this. Although recording community-accepted adjustments to data collection practices may allow for a greater application of occupancy modelling or other approaches in the future, methods that seek accurate trends over the long-term are necessarily limited either to scales at which various properties of the data in hand are most likely to be unbiased, or at which the biases are well enough understood to be modelled accurately.
Read full abstract