Data weighting: An iterative process linking surveys, data synthesis, and population models to evaluate mis-specification

James T Thorson,Cole C Monnahan,Peter-John F Hulson

doi:10.1016/j.fishres.2023.106762

Abstract

Integrated stock assessments specify a distribution for multiple data types, and these distributions control the relative leverage assigned to each datum. A decade of research has demonstrated that (1) proper data weighting is necessary to avoid bias resulting from overweighting noisy age- and length-composition data; (2) sampling data can be pre-processed to estimate the likely sampling variance for composition data; and (3) using random effects to estimate time-varying parameters can improve the fit to data while also changing statistical leverage, and thereby serve a similar role to reweighting data. However, there are also unresolved questions including: (A) Is it more appropriate to model age and length data as proportions-at-age and as an index for the total, or as a series of indices-at-age? (B) Are correlated residuals appropriately addressed via data weighting or do they require additional model changes (i.e., time-varying parameters)? (C) How to efficiently communicate information about sampling imprecision and model errors between sampling and stock-assessment teams? (D) How does model-based expansion of sampling data affect data weighting? and (E) How to address alternative hypotheses about factors driving poor fit to data? Here, we argue that stock assessment errors can be classified using four categories: sampling bias (e.g., changes in survey coverage), sampling imprecision (e.g., finite sample sizes), assessment model bias (e.g., incorrect demographic assumptions) and assessment model imprecision (e.g., random effects). This categorization has several implications with resulting practical recommendations. For example, we define Percent Excess Variance (PEV) from the ratio of input sample size (the measured variance of sampling imprecision) and effective sample sizes (the variance of assessment-model residuals). We propose calculating PEV as standardized diagnostic measuring the net effect of survey bias and assessment model bias and imprecision. We demonstrate PEV in a simulation experiment fitted using the Woods Hole Assessment Model (WHAM) conditioned upon Gulf of Alaska walleye pollock, where unacknowledged variation in fishery selectivity results in a PEV of 77 % and this is eliminated when correctly specifying a time-varying estimation model. We also argue that model-based expansion of data inputs using auxiliary information can mitigate sampling bias, while also measuring sampling imprecision for spatially unrepresentative surveys. Similarly, including random effects can mitigate model bias while increasing model imprecision when the demographic model has little explanatory power. Finally, we observe that down-weighting compositional data for a given fleet fails to propagate information about model residuals when interpreting abundance indices or reference points for that same fleet. When PEV is large for important fleets, we therefore encourage focused research to explain the sources of these errors rather than simply downweighting without propagating information about residuals. However, we acknowledge a continuing role for automated data weighting for less important fleets, although we recommend explicit hypotheses about potential sources of errors in those cases.

Full Text