Handling missing data in ecological studies: Ignoring gaps in the dataset can distort the inference

Rafał Łopucki,Adam Kiersztyn,Grzegorz Pitucha,Ignacy Kitowski

doi:10.1016/j.ecolmodel.2022.109964

Abstract

Ecological datasets often contain gaps, outliers or even incorrect data. Ignoring the problem of missing data can lead to reduction in the statistical power of the models used, estimation of biased parameters and incorrect conclusions about the phenomenon studied. In this study, using simulated and real ecological data (seven-year monitoring of offspring production in 239 white stork (Ciconia ciconia) nests), we show how results differ when ignoring missing data, filling the gaps using single methods (including fuzzy methods) and using multiple-imputation and aggregation techniques. Based on simulation data, we showed that data gaps can be filled with high accuracy if the appropriate method (model) is used (96.5% of perfectly matched cases). Based on empirical data, we showed how results can differ when accepting or filling the missing data. These differences concerned both general indicators of breeding success (e.g. total number of offspring, mean annual productivity per nest), differences in trends (e.g. increase or decrease in productivity between years) and more detailed analyses, such as ranks of the most productive nests. The observed differences in the results could lead to formulation of incorrect conclusions about the state of the stork population, the condition of its habitat or conservation guidelines. We highlight that well-developed set of data-imputation methods dedicated to monitoring the white stork could increase the accuracy of modern estimates and re-analyse rich historical data. Similar data pre-processing solutions to fill data gaps should also find wider application in other ecological research.

Full Text