Recent country and continental-scale digital soil mapping efforts have used a single model to predict soil properties across large regions. However, different ecophysiographic regions within large-extent areas are likely to have different soil-landscape relationships so models built specifically for these regions may more accurately capture these relationships relative to a ‘global’ model. We ask the question: Is a single ‘global’ model sufficient or are regionally-specific models useful for accurate digital soil mapping? We test this question by modeling soil depth classes across the 432,000 km2 upper Colorado River Basin in the Western USA using a single global model, multiple ecophysiographic models, and ensembles of the ecophysiographic models.Effective soil depth class observations (n = 12,194) were derived from multiple soil databases. Fifty-seven environmental covariates were derived from a 30 m digital elevation model, climate data, satellite imagery, and aeroradiometric data. Three independent land classifications were used to stratify the area. Two expert-derived land classifications, USDA Major Land Resource Areas (MLRA) and US-EPA Level III ecoregions, divided the study area into multiple ecophysiographic regions based on vegetation and broad-scale physiographic differences. The third land classification divided the study area into broad landforms.Soil depth observations were split into separate training (n = 10,470) and validation (n = 1,724) datasets. First, a ‘global’ random forest model was used to model soil depth classes using all training observations and covariates. ‘Global’ denotes a model built with all training data across the extent of the area, not a model at world extent. Second, the land classifications were used to subset the observations into ecophysiographic sub-datasets and random forest models were refit for each region. Models fit by ecophysiographic region are referred to as regional models. Thirdly, predictions from each regional model were fused into regional-ensemble models. Accuracy, Brier scores, and Shannon’s entropy were used to compare model accuracy and uncertainty. Regional ecophysiographic models were also compared to models built for geographic areas that were defined solely to be approximately equal in area. Training dataset density and the imbalance ratio were investigated to determine if data characteristics influenced regional accuracy/uncertainty metrics.Accuracy for the global model using the validation set was 62.8%. Regional model accuracies ranged between 56.1% and 75.0%. We found: 1) useful inter-regional differences in global model accuracy were revealed when the global model was validated by region, 2) no consistent relationship between training observation density and accuracy/uncertainty metrics, 3) no meaningful differences in accuracy and uncertainty metrics between physiographic and geographic regions, 4) ensembles of regionally-specific models were approximately as accurate as global models, and 5) both region-specific models and ensembles of regional models were less uncertain than the global model. Overall, we recommend the use of soil depth class predictions made from MLRA regional ensemble models because this prediction had higher accuracy than the ecoregion ensemble model prediction, but lower uncertainty than both the global model and the landform ensemble model predictions. We answer our question: Ensembles of regionally-specific models are approximately as accurate as global models, but result in less uncertainty.
Read full abstract