Soil Series Mapping By Knowledge Discovery from an Ohio County Soil Map

Sakthi K. Subburayalu,Brian K. Slater

doi:10.2136/sssaj2012.0321

Abstract

Machine learning can be used to derive predictive spatial models from existing soil maps, for updating soil surveys, improving efficiency of new surveys in similar landscapes, and to disaggregate map units containing multiple soil series, such as in the Soil Survey Geographic Database (SSURGO). One challenge with using aggregated soil map units as a source for training machine learning systems to map series is ambiguity in labeling the training set. Ambiguity emerges while assigning soil series to instances that would be used as training instances in modeling the data, as a map unit in SSURGO can contain more than one component soil series. Disambiguation of training instances is proposed as a technique to handle ambiguity. The k‐nearest neighbor (kNN) algorithm, which classifies the training examples based on closest training examples in attribute space using the list of component soil series information available in the tabular data of SSURGO, is proposed as a viable method to assign most likely soil series to training instances. Two different learning algorithms, J48, a classification tree algorithm, and Random Forest, an ensemble classifier, were applied to evaluate soil series prediction for Monroe County, Ohio. The results showed an improvement in prediction accuracy with disambiguation using kNN. Among the two learning algorithms, Random Forest demonstrated better performance in mapping major soils. However, J48 predicted some minor soils which were not predicted by Random Forest. The maps were useful in identifying areas of uncertainty such as misplacement of polygon boundaries, presence of inclusions, and incorrect labeling, which could serve as a guide for further field investigations and for rationalizing the mapping intensity for SSURGO maps.

Full Text