Abstract

The uptake of machine learning (ML) algorithms in digital soil mapping (DSM) is transforming the way soil scientists produce their maps. Within the past two decades, soil scientists have applied ML to a wide range of scenarios, by mapping soil properties or classes with various ML algorithms, on spatial scale from the local to the global, and with depth. The wide adoption of ML for soil mapping was made possible by the increase in data availability, the ease of accessing environmental spatial data, and the development of software solutions aided by computational tools to analyse them. In this article, we review the current use of ML in DSM, identify the key challenges and suggest solutions from the existing literature. There is a growing interest in the use of ML in DSM. Most studies emphasize prediction and accuracy of the predicted maps for applications, such as baseline production of quantitative soil information. Few studies account for existing soil knowledge in the modelling process or quantify the uncertainty of the predicted maps. Further, we discuss the challenges related to the application of ML for soil mapping and suggest solutions from existing studies in the natural sciences. The challenges are: sampling, resampling, accounting for the spatial information, multivariate mapping, uncertainty analysis, validation, integration of pedological knowledge and interpretation of the models. Overall, the current literature shows few attempts in understanding the underlying soil structure or process using the predicted maps and the ML model, for example by generating hypotheses on mechanistic relationships among variables. In this regard, several additional challenging aspects need to be considered, such as the inclusion of pedological knowledge in the ML algorithm or the interpretability of the calibrated ML model. Tackling these challenges is critical for ML to gain credibility and scientific consistency in soil science. We conclude that for future developments, ML could incorporate three core elements: plausibility, interpretability, and explainability, which will trigger soil scientists to couple model prediction with pedological explanation and understanding of the underlying soil processes.

Highlights

  • 18 Conventionally, spatial prediction of soil has been embedded in the geostatisti19 cal framework (Heuvelink & Webster, 2001) in which a sample of a soil property is 20 modelled as a sum of a linear combination of environmental covariates and a spa21 tially autocorrelated residual, and prediction at unobserved locations is made by kriging

  • Dig86 ital soil mapping, has unique characteristics which require adaptation of the machine learning (ML) algorithms. These features are for example, but not limited to, the inclusion of pedological knowledge in the ML algorithm, the accounting of spatial structure present in the raw soil data, or the need to increase our scientific understanding of the soil from a calibrated ML model

  • We identify gaps in the knowledge and define areas in which adapting ML algorithms would be beneficial for their use in digital soil mapping (DSM)

Read more

Summary

Introduction

18 Conventionally, spatial prediction of soil has been embedded in the geostatisti cal framework (Heuvelink & Webster, 2001) in which a sample of a soil property is 20 modelled as a sum of a linear combination of environmental covariates and a spa tially autocorrelated (stochastic) residual, and prediction at unobserved locations is made by kriging. The study concludes that RF does not benefit from a uniform spread of the units in the geographic/feature space, nor from reproducing the marginal distribution of the co388 variates (as it is done in cLHS) These results apply for RF but there is a need to further investigate sampling designs for other machine learning algorithms. While most studies in our literature review (Table 1) use a grid-based sampling or cLHS, there is evidence that most conventional sampling designs (e.g. spatial coverage sampling) are not effective for the purpose of mapping with machine learning. MacKay (1992) defined an objective function that searches for the optimal units in the space spanned by the predictors (i.e. covariates) for prediction using a neural network algorithm Taking the latter considerations and testing active learning for sampling design optimization would certainly make a valuable contribution to digital soil mapping research. Including other criteria to assess the overall performance of a ML model would certainly make one step towards “conscious” digital soil mapping, and participate to the uptake of knowledge discovery via machine learning in soil science

Conclusion
Findings
1016 References
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call