Dealing with data conflicts in statistical inference of population assessment models that integrate information from multiple diverse data sets

Mark N Maunder,Kevin R Piner

doi:10.1016/j.fishres.2016.04.022

Abstract

Contemporary fisheries stock assessments often use multiple diverse data sets to extract as much information as possible about biological and fishery processes. However, models are, by definition, simplifications of reality and, therefore, misspecified. Model misspecification can cause degradation of results when multiple data sets are analyzed simultaneously. The process, observation, and sampling components of the model must all be, at least, approximately correct to minimize bias. Unfortunately, even the basic processes that are usually considered well understood (e.g., growth and selectivity) are misspecified in most, if not all, stock assessments. These misspecified processes, in combination with use of composition data, result in biased estimates of absolute abundance and abundance trends, which are often evident as “data conflicts.” This is compounded by over-weighting of composition data in many assessments owing to misuse of data-weighting approaches. The ‘law of conflicting data’ states that since data are facts, conflicting data implies model misspecification, but must be interpreted in the context of random sampling error. Down-weighting (or dropping) conflicting data is not necessarily appropriate because it may not resolve the model misspecification. Model misspecification and process variation can be accounted for in the variance parameters of the likelihoods (sampling error), but it is unclear when, or even if, this is appropriate. The appropriate method to deal with data conflicts depends on whether it is caused by random sampling error, process variation, observation model misspecification, or misspecification of the system (dynamics) model. Diagnostic approaches are urgently needed to evaluate goodness of fit and to identify model misspecification. We recommend external estimation of the sampling error variance in likelihood functions, modelling process variation in integrated models, and internal estimation of the standard deviation of the process variation. The required statistical framework is computationally intensive, but practical approximations are available, computational algorithms are being improved, and computer power is increasing. We provide a framework for model development that identifies and corrects model misspecification and illustrate the framework, using simulated data.

Full Text