Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables.

Tomislav Hengl,Gerard B M Heuvelink,Madlene Nussbaum,Marvin N Wright,Benedikt Gräler

doi:10.7717/peerj.5518

Abstract

Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as “knowledge engines” in various geoscience fields. Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. The key to the success of the RFsp framework might be the training data quality—especially quality of spatial sampling (to minimize extrapolation problems and any type of bias in data), and quality of model validation (to ensure that accuracy is not effected by overfitting). For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp.

Highlights

Kriging and its many variants have been used as the Best Unbiased Linear Prediction technique for spatial points since the 1960’s (Isaaks and Srivastava, 1989; Cressie, 1990; Goovaerts, 1997)
We have shown that random forest can be used to generate unbiased spatial predictions and model and map uncertainty
The advantages of random forest vs linear geostatistical modeling and techniques such as kriging, lies in the fact that no stationarity assumptions need to be followed, nor is there a need to specify transformation or anisotropy parameters (or to fit variograms at all!)

Summary

Introduction

Kriging and its many variants have been used as the Best Unbiased Linear Prediction technique for spatial points since the 1960’s (Isaaks and Srivastava, 1989; Cressie, 1990; Goovaerts, 1997). In this paper we describe a generic framework for spatial and spatiotemporal prediction that is based on random forest and which we refer to as “RFsp”. With this framework we aim at including information derived from the observation locations and their spatial distribution into predictive modeling. We explain in detail (using standard data sets) how to extend machine learning to general spatial prediction, and compare the prediction efficiency of random forest with that of state-of-the-art kriging methods using 5–fold cross-validation with refitting the model in each subset (in the case of spatiotemporal kriging without refitting). All datasets used in this paper are either part of an existing R package or can be obtained from the GitHub repository

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Aug 29, 2018
Citations: 570	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Combination of Fuzzy Logic and Kriging Technique Under Uncertainty for Spatial Data Prediction
Safa Ibrahim ... Ghanim Dhaher
Journal of Education and Science | VOL. 31
Safa Ibrahim, et. al.Safa Ibrahim ... Ghanim Dhaher
01 Sep 2022
Journal of Education and Science | VOL. 31

A machine learning and geostatistical hybrid method to improve spatial prediction accuracy of soil potentially toxic elements
Abiot Molla ... Shudi Zuo
Stochastic Environmental Research and Risk Assessment | VOL. 37
Abiot Molla, et. al.Abiot Molla ... Shudi Zuo
04 Sep 2022
Stochastic Environmental Research and Risk Assessment | VOL. 37

Random forest for spatial prediction of censored response variables
Francky Fouedjio
Artificial Intelligence in Geosciences | VOL. 2
Francky FouedjioFrancky Fouedjio
01 Dec 2021
Artificial Intelligence in Geosciences | VOL. 2

Spatial Prediction and Mapping of Gully Erosion Susceptibility Using Machine Learning Techniques in a Degraded Semi-Arid Region of Kenya
Kennedy Were ... Ruth Njoroge
Land | VOL. 12
Kennedy Were, et. al.Kennedy Were ... Ruth Njoroge
15 Apr 2023
Land | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ