Abstract

Spatial data mining helps to find hidden but potentially informative patterns from large and high-dimensional geoscience data. Non-spatial learners generally look at the observations based on their relationships in the feature space, which means that they cannot consider spatial relationships between regionalised variables. This study introduces a novel spatial random forests technique based on higher-order spatial statistics for analysis and modelling of spatial data. Unlike the classical random forests algorithm that uses pixelwise spectral information as predictors, the proposed spatial random forests algorithm uses the local spatial-spectral information (i.e., vectorised spatial patterns) to learn intrinsic heterogeneity, spatial dependencies, and complex spatial patterns. Algorithms for supervised (i.e., regression and classification) and unsupervised (i.e., dimension reduction and clustering) learning are presented. Approaches to deal with big data, multi-resolution data, and missing values are discussed. The superior performance and usefulness of the proposed algorithm over the classical random forests method are illustrated via synthetic and real cases, where the remotely sensed geophysical covariates in North West Minerals Province of Queensland, Australia, are used as input spatial data for geology mapping, geochemical prediction, and process discovery analysis.

Highlights

  • Spatial data mining reveals hidden and previously unknown but potentially informative patterns from big and high-dimensional geoscience data

  • The objective of this study is to develop a spatial random forests (SRF) technique based on nonparametric higher-order spatial statistics for spatial data analysis and modelling

  • The same order of spatial statistics was selected for the two input variables, mE1==cEei2li=ng8 1 (F∑ig2r.=31(Ed)r an=d (e))

Read more

Summary

Introduction

Spatial data mining reveals hidden and previously unknown but potentially informative patterns from big and high-dimensional geoscience data. It takes advantage of the ever-growing availability of geographically referenced data and their potential abundance (Sellars 2018). Geoscience processes vary significantly through time and space Such heterogeneity and non-stationarity are related to the spatial and/or temporal variation of soil types, rock types, land uses, vegetation types, climatic conditions, and tectonic activities. Geographical observations that are located close to each other in space and time tend to share similar characteristics. This phenomenon is known as auto-correlation and provides additional information to inform statistical models (Matheron 1962; Cliff and Ord 1973). Predictions and inferences from non-spatial learners can be misleading when applied to geoscience data (Reichstein et al 2019; Bergen et al 2019; Karpatne et al 2019)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call