Abstract

Machine learning algorithms have become widely used for geospatial applications, including spatial mapping and upscaling ecological variables and traits. Multivariate splines, random forests, and neural networks have been widely used to upscale a few sparse measurements to larger areas. Machine learning models, however, cannot offer reliable predictions in out-of-the-sample areas, which is often the case in such applications [1,2]. In [3], an area of applicability is proposed as an extrapolation index based on the minimum distance to the training data in the multidimensional predictor space with predictors being weighted by their respective importance in the model. We propose Gaussian Processes (GPs) to derive such extrapolation indicator [4].  A GP is a popular method in machine learning and multivariate statistics for regression problems. It provides a probabilistic description of the predictive function, so one can derive both predictive mean and variance for the predictions on new data. We here suggest using the predictive variance as an indicator for extrapolation and show the relation with a customized dissimilarity index computed that follows the Area of Applicability methodology proposed in [3]. We show the relation and in some cases the generalization in a set of controlled synthetic experiments and for vegetation traits global mapping using remote sensing, meteorological variables and the (huge yet sparse and biased) TRY database. This relation opens the door to a more sound way of identifying and characterizing extrapolation regimes through GPs in geospatial and upscaling applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call