Abstract

Statistical learning theory provides the foundation to applied machine learning, and its various successful applications in computer vision, natural language processing and other scientific domains. The theory, however, does not take into account the unique challenges of performing statistical learning in geospatial settings. For instance, it is well known that model errors cannot be assumed to be independent and identically distributed in geospatial (a.k.a. regionalized) variables due to spatial correlation; and trends caused by geophysical processes lead to covariate shifts between the domain where the model was trained and the domain where it will be applied, which in turn harm the use of classical learning methodologies that rely on random samples of the data. In this work, we introduce thegeostatistical (transfer) learningproblem, and illustrate the challenges of learning from geospatial data by assessing widely-used methods for estimating generalization error of learning models, under covariate shift and spatial correlation. Experiments with synthetic Gaussian process data as well as with real data from geophysical surveys in New Zealand indicate that none of the methods are adequate for model selection in a geospatial context. We provide general guidelines regarding the choice of these methods in practice while new methods are being actively researched.

Highlights

  • Classical learning theory [1,2,3] and its applied machine learning methods have been popularized in the geosciences after various technological advances, leading initiatives in open-source software [4,5,6,7], and intense marketing from a diverse portfolio of industries

  • Having understood the main differences between classical and geostatistical learning, we focus our attention to a specific type of geostatistical transfer learning problem, and illustrate some of the unique challenges caused by spatial dependence

  • We introduce geostatistical learning, and demonstrate how most prior art in statistical learning with geospatial data fits into a category we term pointwise learning

Read more

Summary

Introduction

Classical learning theory [1,2,3] and its applied machine learning methods have been popularized in the geosciences after various technological advances, leading initiatives in open-source software [4,5,6,7], and intense marketing from a diverse portfolio of industries. In spite of its popularity, learning theory cannot be applied straightforwardly to solve problems in the geosciences as the characteristics of these problems violate fundamental assumptions used to derive the theory and related methods Among these methods derived under classical assumptions (more on this later), those for estimating the generalization (or prediction) error of learned models in unseen samples are crucial in practice [2]. If estimates of error are inaccurate because of violated assumptions, there is great chance that models will be selected inappropriately [9]. The issue is aggravated when models of great expressiveness (i.e., many learning parameters) are considered in the collection since they are quite capable of overfitting the available data [10, 11]

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call