Spatial interpolation is essential for handling sparsity and missing spatial data. Current machine learning-based spatial interpolation methods are subject to the statistical constraints of spatial stratified heterogeneity (SSH), normally involving separate modeling of each stratum and simple weighted averaging to integrate intra-stratum and inter-strata features. However, these models overlook the different contributions of inter-strata features to different locations within a stratum (heterogeneous inter-strata associations, HIA) and the explanation of spatial effects on the interpolation process, leading to suboptimal and unreliable interpolation outcomes. This article proposes a novel explainable spatial interpolation method considering SSH (X-SSHM). Spatial and environmental features are utilized to describe intra-stratum and inter-strata information, which are fed into random forest-based learners to achieve high-level semantic feature mapping. Geographically weighted regression is employed to integrate intra-stratum and inter-strata features to achieve a unified expression of SSH and HIA, obtaining the final interpolation result. Geographically weighted Shapley (GSHAP) is proposed to decompose the marginal contributions of intra-stratum and inter-strata features. Model performance is evaluated on simulated and soil organic matter datasets. X-SSHM outperformed five baselines regarding interpolation accuracy. Moreover, statistical methods validated X-SSHM’s ability to elucidate the mechanisms by which SSH, spatial autocorrelation and HIA affect the model interpolation process.
Read full abstract