Comparing and Detecting Stationarity and Dataset Shift

Camilla Da Silva,Jeff Boisvert,Jed Nisenson

doi:10.1007/978-3-031-19845-8_3

Abstract

AbstractMachine learning algorithms have been increasingly applied to spatial numerical modeling. However, it is important to understand when such methods will underperform. Machine learning algorithms are impacted by dataset shift; when modeling domains of interest present non-stationarities there is no guarantee that the trained models are effective in unsampled areas. This work aims to compare the stationarity requirement of geostatistical methods to the concept of dataset shift. Also, workflow is developed to detect dataset shift in spatial data prior to modeling, this involves applying a discriminative classifier and a two sample Kolmogorv-Smirnov test to model areas. And, when required a lazy learning modification of support vector regression is proposed to account for dataset shift. The benefits of the lazy learning algorithm are demonstrated on the well-known non-stationary Walker Lake dataset and improves root mean squared error up to 25% relative to standard SVR approach, in areas where dataset shift is present.

Full Text