Abstract

Information on the spatial distribution of soil pH is essential for assessing soil quality and soil productivity. Digital soil mapping (DSM) is commonly used to predict soil characteristics over various types of landscapes. Over the past decade, researchers have made progress using machine learning techniques to provide reliable predictions of soil properties with limited data. DSM studies often use a single learning approach, which is constructed with a machine learner that systematically extracts soil–environment relationships from a large database, whereby a fitted model is used to predict soil information in an unmapped area. The practice of using an ensemble learning approach, especially one that combines several base learners, has rarely been tested in DSM. We developed a workflow for using an ensemble learning algorithm to predict soil properties for the Thompson-Okanagan region of British Columbia, Canada. Here, we focused on soil pH and tested a variety of base learners. Base learners with high prediction accuracies were then used to construct a SuperLearner (SL) to extract the complex relationships between soil properties and environmental variables. The fitted SL was then used to predict soil properties at 25 m spatial resolution at three depth intervals (0–5, 5–15, and 15–30 cm). Prediction accuracies were assessed using an independent test dataset, which indicated that the SL had a similar prediction accuracy to the best individual base learners. Using the heterogeneous ensemble learning approach with a weighted average stacked generalization process eliminated the need to choose the best base learner.

Highlights

  • Digital soil mapping (DSM) has increasingly applied novel machine learning techniques to predict the spatial distribution of soil properties and types (Brungard et al 2015; Heung et al 2016; Khaledian and Miller 2020)

  • The objectives of this study were to (1) evaluate and compare a set of base learners, (2) test the potential of using the ensemble learning approach with stacked generalization to extract the relationships between soil properties and environmental variables derived from a digital elevation model (DEM), and (3) compare and assess the use of the ensemble learner with the individual base learners for mapping the spatial distribution of soil pH at multiple depth increments for the Thompson-Okanagan region of British Columbia, Canada

  • At both the 0–5 and 5–15 cm depth increments, pH was significantly different between each of the vegetation types, with pH being highest for grass (G), intermediate for forest intermixed with grass (FG), and lowest for forest (F)

Read more

Summary

Introduction

Digital soil mapping (DSM) has increasingly applied novel machine learning techniques to predict the spatial distribution of soil properties and types (Brungard et al 2015; Heung et al 2016; Khaledian and Miller 2020). Machine learning algorithms have the potential to quantify the highdimensional and nonlinear relationships between the environmental predictors and soil response variables over diverse ecosystem types. With improvements in computer technology (Rossiter 2018) and machine learning algorithms over the past decade, more powerful learners were designed to process larger datasets using a larger number of environmental variables. Examples of such algorithms have included, but are not limited to, generalized linear regression (GLM; Hastie and Pregibon 1992), stepwise regression (STEP; Hastie and Pregibon 1992), and lasso and elastic net regularized generalized linear regression (GLMNET; Friedman et al 2010), which are capable of processing nonlinear relationships for both categorical and continuous data (Simon et al 2011). The CART approaches form the basis of more advanced, tree-based learners such as CART with bagging (Breiman 1996a), the cubist learner (Quinlan 1992, 1993), and the random forest (RF) model (Breiman 2001)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call