Abstract

With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y 1, y 2, …, yn ) with covariates (x1, ..., x n ) can be modeled non-parametrically as yi = m(x i ) + ϵi , where m(x i ) is mean component and ∈i accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process ∈i (second order dependence). The correlation is typically a function of "spatial distance" or "time-lag" between two observations. Unlike linear regression, non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. However, they often fail to account for the dependence structure, resulting in sub-optimal estimation. On the other hand, specialized software for spatial/temporal data properly models data correlation but lacks flexibility in modeling the mean function m by only focusing on linear models. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) - namely, RF-GLS - by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. Additionally, RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data.

Highlights

  • With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence

  • Dependent observations (y1, y2, · · ·, yn) with covariates (x1, . . . , xn) can be modeled non-parametrically as yi = m(xi) + i, where m(xi) is mean component and i accounts for the dependency in data

  • We assume that dependence is captured through a covariance function of the correlated stochastic process i

Read more

Summary

Summary

With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Xn) can be modeled non-parametrically as yi = m(xi) + i, where m(xi) is mean component and i accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process i (second order dependence). Non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. They often fail to account for the dependence structure, resulting in sub-optimal estimation. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) – namely, RF-GLS – by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data

Statement of need
State of the field
The RandomForestsGLS package
Spatial Data
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call