Semi-automated disaggregation of conventional soil maps using knowledge driven data mining and classification trees

Travis W Nauman,James A Thompson

doi:10.1016/j.geoderma.2013.08.024

Abstract

Disaggregation of conventional soil surveys has been identified as a potential source for much of the next generation of model-ready digital soil spatial data. This process aims to apportion vector soil surveys into raster (gridded) representations of the component soils that are often aggregated together in map unit designs. Most soil surveys are published with some description of the soil–landscape relationships that distinguish component soils within map units. We used these descriptions found in the Soil Survey Geographic (SSURGO) database of Webster and Pocahontas Counties in West Virginia, USA, to build a set of representative training areas for all soil components by using 1-arc second digital elevation data and derived geomorphic indices. These training areas were then used in classification tree ensembles with a more extensive environmental database to transform the original SSURGO map into a gridded soil series map. We created underlying prediction frequency surfaces from the models that can be used for creating continuous representations of soil class and property distributions.Disaggregation models matched training sets in 71%–74% of pixels and matched components in original SSURGO map units in 56%–65% of the study area. We evaluated both the original SSURGO data and our models using 87 independent pedons not used in model building. Validation pedons matched components in SSURGO map units at 39% of sites, but in map units that only included one named component (as opposed to multiple soils that could be matched to validation pedons) only 27% of the sites matched. Disaggregation predictions matched validation pedon classes 22–24% of the time using nearest neighbor spatial matches, and these rates increased to 39–44% for correct predictions within a 60-meter radius of the pedon. To characterize uncertainty, we compared relative ensemble prediction frequency (probability) of final hardened model classes at validation sites. Sites with correct predictions had generally higher prediction frequencies; which lead us to use them to create an uncertainty model. Uncertainty was calculated by determining the rate of correct predictions at validation sites within different intervals of prediction frequencies using nearest neighbor validation results. We were able to discern four uncertainty classes with values of 7%, 18%, 20% and 43%, which we called “ground truth probabilities”. We present the methods to create these models as a specific example of how disaggregation techniques may be used to aid in updating national soil survey inventories.

Full Text