Machine learning of large-scale spatial distributions of wild turkeys with high-dimensional environmental data.

Annie Farrell,James A Martin,Scott A Rush,Guiming Wang,Dave Godwin,Adam B Butler,Jerrold L Belant

doi:10.1002/ece3.5177

Abstract

Species distribution modeling often involves high‐dimensional environmental data. Large amounts of data and multicollinearity among covariates impose challenges to statistical models in variable selection for reliable inferences of the effects of environmental factors on the spatial distribution of species. Few studies have evaluated and compared the performance of multiple machine learning (ML) models in handling multicollinearity. Here, we assessed the effectiveness of removal of correlated covariates and regularization to cope with multicollinearity in ML models for habitat suitability. Three machine learning algorithms maximum entropy (MaxEnt), random forests (RFs), and support vector machines (SVMs) were applied to the original data (OD) of 27 landscape variables, reduced data (RD) with 14 highly correlated covariates being removed, and 15 principal components (PC) of the OD accounting for 90% of the original variability. The performance of the three ML models was measured with the area under the curve and continuous Boyce index. We collected 663 nonduplicated presence locations of Eastern wild turkeys (Meleagris gallopavo silvestris) across the state of Mississippi, United States. Of the total locations, 453 locations separated by a distance of ≥2 km were used to train the three ML algorithms on the OD, RD, and PC data, respectively. The remaining 210 locations were used to validate the trained ML models to measure ML performance. Three ML models had excellent performance on the RD and PC data. MaxEnt and SVMs had good performance on the OD data, indicating the adequacy of regularization of the default setting for multicollinearity. Weak learning of RFs through bagging appeared to alleviate multicollinearity and resulted in excellent performance on the OD data. Regularization of ML algorithms may help exploratory studies of the effects of environmental factors on the spatial distribution and habitat suitability of wildlife.

Highlights

Studies of the spatiotemporal distribution of resources that sup‐ port organisms are indispensable for understanding the dynamics of animal populations, including avian populations, across space and time (Fuller, 2012)
We evaluated the predictive accuracy of Ecological niche factor analysis (ENFA), random forests (RFs), maximum entropy (MaxEnt), and support vector machines (SVMs) predictions using the same test data (210 nonduplicated presence locations) with the continuous Boyce index (CBI; Boyce, Vernier, Nielsen, & Schmiegelow, 2002; Hirzel, Lay, Helfer, Randin, & Guisan, 2006)
This study assessed the effectiveness of two different methods of correlation removal and principal component approaches to address multicollinearity on the predictive performance of Maximum en‐ tropy (MaxEnt), random forests (RFs), and support vector machines (SVMs) for habitat suitability modeling

Summary

| INTRODUCTION

Studies of the spatiotemporal distribution of resources that sup‐ port organisms are indispensable for understanding the dynamics of animal populations, including avian populations, across space and time (Fuller, 2012). Habitat suitability mapping often uses a large number of land‐ scape variables (e.g., 10 or more variables) to predict habitat suit‐ ability Many of those landscape variables are highly correlated to one another, leading to multicollinearity in habitat and resource selection models (Aebischer, Robertson, & Kenward, 1993; Cutler et al, 2007). Drake et al (2006) demonstrated that unprocessed data (their model 1) and orthogonal transformation (method 2) performed and better than correlation removal (method 3) in SVMs. Random forests may alleviate multicollinear‐ ity with a randomized subset of explanatory variables when grow‐ ing each tree branch (Cutler et al, 2007). We first developed statewide habitat suitability maps with a large sample size of presence data (e.g., 600–700 presence locations) using MaxEnt, RFs, and SVMs. Second, we compared predictive performances of MaxEnt, RFs, and SVMs between correlation removal and principal component ap‐ proaches to multicollinearity. Ecological studies have not exploited extensively the excellent performances of SVMs in pattern identifi‐ cation and recognition and the capacity to analyze large amounts of data and complex relationships (Huettmann et al, 2018)

| METHODS

| DISCUSSION

Findings

CONFLICT OF INTEREST

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Ecology and Evolution	Publication Date: Apr 24, 2019
Citations: 51	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Machine learning of large-scale spatial distributions of wild turkeys with high-dimensional environmental data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecology and Evolution

Lead the way for us

Similar Papers

Machine Learning Model Based on Prognostic Nutritional Index for Predicting Long-Term Outcomes in Patients With HCC Undergoing Ablation.
Nan Zhang ... Bowen Zhuang
Cancer medicine | VOL. 13
Nan Zhang, et. al.Nan Zhang ... Bowen Zhuang
01 Oct 2024
Cancer medicine | VOL. 13

Effective prediction of lost circulation from multiple drilling variables: a class imbalance problem for machine and deep learning algorithms
David A Wood ... Sajjad Mardanirad
Journal of Petroleum Exploration and Production Technology | VOL. 12
David A Wood, et. al.David A Wood ... Sajjad Mardanirad
08 Dec 2021
Journal of Petroleum Exploration and Production Technology | VOL. 12

Predicting Visual Acuity Responses to Anti-VEGF Treatment in the Comparison of Age-related Macular Degeneration Treatments Trials Using Machine Learning
Rajat S. Chandra ... Gui-shuang Ying
Ophthalmology. Retina | VOL. 8
Rajat S. Chandra, et. al.Rajat S. Chandra ... Gui-shuang Ying
24 Nov 2023
Ophthalmology. Retina | VOL. 8

Predicting 30-Day Non-Seizure Outcomes Following Temporal Lobectomy with Personalized Machine Learning Models
Mert Karabacak ... Konstantinos Margetis
World Neurosurgery | VOL. 183
Mert Karabacak, et. al.Mert Karabacak ... Konstantinos Margetis
23 Nov 2023
World Neurosurgery | VOL. 183

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning of large-scale spatial distributions of wild turkeys with high-dimensional environmental data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecology and Evolution