Assessing the influence of environmental factors and datasets on soil type prediction with two machine learning algorithms in a heterogeneous area in the Rur catchment, Germany

Tanja Kramm,Dirk Hoffmeister

doi:10.1016/j.geodrs.2020.e00316

Abstract

Machine Learning (ML) algorithms are a promising alternative to traditional acquisition methods for creating new or updating existing soil maps. This study analyses the suitability of two ML techniques for the prediction of 36 different soil types in the Rur catchment in North-Rhine Westphalia (Germany). For this purpose, the performance of random forest (RF) and artificial neural network (ANN) classifiers have been investigated for three different scenarios with varying environmental co-variables for prediction and two varying training datasets with different sampling strategies. It has been analysed how the accuracy of classified digital soil map products is affected by the diversity of available soil types within different landscapes of the catchment, by varying topography, as well as different spatial resolutions of the co-variables and the distribution of training points. Co-variables derived from a digital elevation model (DEM) were once generated with a high-resolution DEM from airborne laser scanning data in a spatial resolution of 15 m and once with the 90 m TanDEM-X WorldDEMtm. Results generally show best performance for the RF classification with overall accuracies (OA) over 70% with a spatially homogenized training dataset. The ANN classifier performed on average about 5% lower compared to RF. Furthermore, it could be shown for both algorithms that the OA is about 15% - 25% lower for areas in the northernmost and central part of the study area with a very diverse distribution of soil types, compared to other regions with only a few dominating soil types. Particularly for the ANN classifier with spatially homogenized training samples the observed drop in accuracy was considerably high for heterogeneous regions. A comparison of different predictor variables from different DEM sources with greatly varying spatial resolutions showed similar results for both datasets and an increase of accuracy with higher spatial resolutions could not be detected here. Overall, the classification accuracy is mainly affected by the sampling strategy of training samples, the diversity of distributed soil types and the availability of predictive environmental co-variables. In contrast, influence of topography and spatial resolution of DEM for the generation of predictor variables was only minor.

Full Text