Machine learning (ML) methods continue to gain traction in hydrological sciences for predicting variables at large scales. Yet, the spatial transferability of these ML methods remains a critical yet underexamined aspect. We present a metamodel approach to obtain large-scale estimates of drain fraction at 10m spatial resolution, using a ML algorithm (Gradient Boost Decision Tree). Our variable of interest is drain, as artificial drainage of agricultural land is widespread in areas with high groundwater tables. Drainage has significant effects on the hydrological cycle, and impacts groundwater recharge, streamflow partitioning and nutrient transport. Drain flow is controlled by small-scale variations in topography, geology and groundwater depth, which presents challenges to its estimation at large scale. Drain fraction is the average ratio between drain flow and precipitation. The metamodel combines covariates based on topography, land use and geology with simulated drain fraction from 45 field-scalephysically-based hydrological models of Danish drain catchments. The 45 models were jointly calibrated against timeseries of drain flow observations. The metamodel was used to upscale predictions of drain fractions for the entirety of Danish agricultural land. This involved considerable extrapolation beyond the 45 drain catchments used for training, calling for an assessment of spatial transferability. To map transferability of the model, and distinguish areas where metamodel results are reliable or not, we used the concept of area of applicability (AOA). The AOA is determined from the similarity of covariate space covered by the training data compared to each prediction point, assuming a correlation between model performance and covariate similarity. AOA mapping showed 71% of Denmark's agricultural land falling within the AOA of the metamodel. The study presents a stepwise methodology to obtain national-scale results using a ML model trained on local-scale numeric models and an evaluation of its spatial transferability, highly relevant for decision-support purposes.
Read full abstract