Abstract

Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as “knowledge engines” in various geoscience fields. Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. The key to the success of the RFsp framework might be the training data quality—especially quality of spatial sampling (to minimize extrapolation problems and any type of bias in data), and quality of model validation (to ensure that accuracy is not effected by overfitting). For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp.

Highlights

  • Kriging and its many variants have been used as the Best Unbiased Linear Prediction technique for spatial points since the 1960’s (Isaaks and Srivastava, 1989; Cressie, 1990; Goovaerts, 1997)

  • We have shown that random forest can be used to generate unbiased spatial predictions and model and map uncertainty

  • The advantages of random forest vs linear geostatistical modeling and techniques such as kriging, lies in the fact that no stationarity assumptions need to be followed, nor is there a need to specify transformation or anisotropy parameters (or to fit variograms at all!)

Read more

Summary

Introduction

Kriging and its many variants have been used as the Best Unbiased Linear Prediction technique for spatial points since the 1960’s (Isaaks and Srivastava, 1989; Cressie, 1990; Goovaerts, 1997). In this paper we describe a generic framework for spatial and spatiotemporal prediction that is based on random forest and which we refer to as “RFsp”. With this framework we aim at including information derived from the observation locations and their spatial distribution into predictive modeling. We explain in detail (using standard data sets) how to extend machine learning to general spatial prediction, and compare the prediction efficiency of random forest with that of state-of-the-art kriging methods using 5–fold cross-validation with refitting the model in each subset (in the case of spatiotemporal kriging without refitting). All datasets used in this paper are either part of an existing R package or can be obtained from the GitHub repository

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call