Abstract

Abstract. The application of machine learning (ML) techniques in various fields of science has increased rapidly, especially in the last 10 years. The increasing availability of soil data that can be efficiently acquired remotely and proximally, and freely available open-source algorithms, have led to an accelerated adoption of ML techniques to analyse soil data. Given the large number of publications, it is an impossible task to manually review all papers on the application of ML in soil science without narrowing down a narrative of ML application in a specific research question. This paper aims to provide a comprehensive review of the application of ML techniques in soil science aided by a ML algorithm (latent Dirichlet allocation) to find patterns in a large collection of text corpora. The objective is to gain insight into publications of ML applications in soil science and to discuss the research gaps in this topic. We found that (a) there is an increasing usage of ML methods in soil sciences, mostly concentrated in developed countries, (b) the reviewed publications can be grouped into 12 topics, namely remote sensing, soil organic carbon, water, contamination, methods (ensembles), erosion and parent material, methods (NN, neural networks, SVM, support vector machines), spectroscopy, modelling (classes), crops, physical, and modelling (continuous), and (c) advanced ML methods usually perform better than simpler approaches thanks to their capability to capture non-linear relationships. From these findings, we found research gaps, in particular, about the precautions that should be taken (parsimony) to avoid overfitting, and that the interpretability of the ML models is an important aspect to consider when applying advanced ML methods in order to improve our knowledge and understanding of soil. We foresee that a large number of studies will focus on the latter topic.

Highlights

  • The application of machine learning (ML) techniques in various fields of science has increased rapidly, especially in the last 10 years

  • To illustrate that gradient between simple and complex, we considered a linear model (LM) with 2 variables to be simple compared to a LM with 100 variables; a classification and regression tree (CART) with 2 branches to be simple compared to a CART with 100 branches; and a CART with 2 branches to be simple compared with a LM with 100 variables

  • ML techniques in the context of soil sciences are used in many countries around the world, but are mostly concentrated in developed countries

Read more

Summary

Introduction

The application of machine learning (ML) techniques in various fields of science has increased rapidly, especially in the last 10 years. In particular, pedometrics, has used statistical models to “learn” or understand from data how soil is distributed in space and time (McBratney et al, 2019). Machine learning analysis of soil data is used to draw conclusions on the controls of the distribution of the soil. With respect to artificial intelligence (AI), sometimes we have seen the terms ML and AI used interchangeably. This is understandable confusion since ML is a subset of AI, but not everything related to AI falls in the ML category (e.g. expert systems)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call