A diagnostic modelling framework to construct indices of biotic integrity: A case study of fish in the Zeeschelde estuary (Belgium)

Paul Quataert,Pieter Verschelde,Jan Breine,Geert Verbeke,Els Goetghebeur,Frans Ollevier

doi:10.1016/j.ecss.2011.06.014

Abstract

We propose a coherent regression model building framework to construct fish-based indices. More specifically, we concentrate on the selection of an optimal set of metrics which remains a difficult problem. The paper departs from the observation that an index of biotic integrity (IBI) is analogous to a diagnostic model in medicine assessing the health condition of a patient from a series of biomarkers. In the same vein, an IBI is a diagnostic model predicting the ecosystem condition of a site from a set of (scored) metrics. Metrics are community attributes sensitive to anthropogenic pressure and their scores express the “distance to target” to a reference condition. In a medical context, Receiver Operating Characteristic (ROC) curves are commonly used to assess the diagnostic accuracy of laboratory tests. An ROC curve plots the sensitivity of a test (Se; the capacity to detect a disease or degradation) as a function of its false positive fraction (FPF) which is the complement of the specificity (Sp = 1 – FPF; the capacity to recognise a healthy person or a reference condition). The ROC curve represents the strength of the index to discriminate between degraded and reference sites. Higher curves correspond to stronger tests as then a higher sensitivity can be combined with a lower false positive fraction. Hence, it is intuitively clear to use summary statistics of the ROC curve as criteria to optimise medical tests or biotic indices. In this paper, we illustrate the value of this modelling framework with a case study in the Zeeschelde estuary in Belgium. In essence, a “traditional” IBI is an average of metrics scoring relevant properties of the ecosystem. We demonstrate this average score model (AVG) is a special member of the more flexible predictive logistic model (PLM) family. The selection of a set of metrics becomes equivalent to variable selection in statistical model building. We apply model building techniques as best subsets regression to facilitate the search for an optimal suite of metrics from a candidate set and use cross-validation to avoid overfitting. The results show that a few metrics suffice to discriminate between most-impacted and least-impacted sites.

Full Text