Abstract
Spatial machine learning models can be developed from observations with substantial unexplainable variability, sometimes called ‘noise’. Traditional point-scale metrics (e.g., R2) alone can be misleading when evaluating these models. We present a multi-scale performance evaluation (MPE) using two additional scales (distributional and geostatistical). We apply the MPE framework to predictions of depth to bedrock (DTB) in the Delaware River Basin. Geostatistical analysis shows that approximately one third of the DTB variance is at spatial scale smaller than 2 km. Hence, we interpret our point-scale R2 of 0.3 (testing data) to be sufficient for regional-scale modelling. Bias-correction methods improve performance at two of the three MPE scales: point-scale change is negligible, while distributional and geostatistical performance improves. In contrast, bias correction applied to a global DTB model does not improve MPE performance. This work encourages scale-appropriate performance evaluations to enable effective model intercomparison.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have