Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Tianxiao Hao,Gurutzeta Guillera‐Arroita,José J Lahoz‐Monfort,Jane Elith

doi:10.1111/ecog.04890

Tianxiao Hao, Gurutzeta Guillera‐Arroita + Show 2 more

Open Access

https://doi.org/10.1111/ecog.04890

Copy DOI

Journal: Ecography	Publication Date: Jan 27, 2020
Citations: 227	License type: CC BY 3.0

Affiliation: University of Melbourne

Abstract

Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence–absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard‐pattern and latitudinal slicing). We calibrated and cross‐validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package ‘biomod2’, with standard (‘untuned’) settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver‐operating characteristic curve (AUC) and log‐likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross‐validation performance also yielded good external performance, with blocked cross‐validation proving better suited for this choice, in this study, than repeated random cross‐validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.

Highlights

Species distribution models (SDMs), known as ecological niche models or habitat suitability models, are models that fit species–environment relationships to explain and predict distributions of species
Boosted regression tree (BRT) models were fitted to the same data, tuned according to published procedures
Across all species, blocked internal CVs tended to yield similar area under the receiver-operating characteristic curve (AUC) values to those produced by external validations, with a slightly lower performance estimate internally, understandable given that the models in internal tests were fitted to less data

Summary

Introduction

Species distribution models (SDMs), known as ecological niche models or habitat suitability models, are models that fit species–environment relationships to explain and predict distributions of species. A range of modelling algorithms are available for building SDMs (e.g. generalised linear models, regression trees Maxent; Elith et al 2006). It is not necessarily straightforward for users of SDMs to decide which algorithm is optimal for their situation (Elith and Graham 2009). We note here that this is a particular use of the word ensemble: it considers ensembles across modelling methods This excludes machine learning methods such as random forests and boosted regression trees (Hastie et al 2009) which are in one sense an ensemble, but are different conceptually and based on just one model type (decision trees). In this paper we consider random forests and boosted regression trees as ‘individual’ models

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecography

Lead the way for us

Similar Papers

Does accounting for imperfect detection improve species distribution models?
Christopher Rota ... Jason Evans
Nature Precedings | VOL. -
Christopher Rota, et. al.Christopher Rota ... Jason Evans
09 Nov 2010
Nature Precedings | VOL. -

Does accounting for imperfect detection improve species distribution models?
Christopher Rota ... Richard Hutto
Nature Precedings | VOL. -
Christopher Rota, et. al.Christopher Rota ... Richard Hutto
09 Nov 2010
Nature Precedings | VOL. -

Assessing the effect of prevalence on the predictive performance of species distribution models using simulated data
Truly Santika
Global Ecology and Biogeography | VOL. 20
Truly SantikaTruly Santika
17 Aug 2010
Global Ecology and Biogeography | VOL. 20

Leaving the area under the receiving operating characteristic curve behind: An evaluation method for species distribution modelling applications based on presence‐only data
Laura Jiménez ... Jorge Soberón
Methods in Ecology and Evolution | VOL. 11
Laura Jiménez, et. al.Laura Jiménez ... Jorge Soberón
13 Oct 2020
Methods in Ecology and Evolution | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecography