The aim of this study was to narrow the research gap of ambiguity in which machine learning algorithms should be selected for evaluation in digital soil organic carbon (SOC) mapping. This was performed by providing a comprehensive assessment of prediction accuracy for 15 frequently used machine learning algorithms in digital SOC mapping based on studies indexed in the Web of Science Core Collection (WoSCC), providing a basis for algorithm selection in future studies. Two study areas, including mainland France and the Czech Republic, were used in the study based on 2514 and 400 soil samples from the LUCAS 2018 dataset. Random Forest was first ranked for France (mainland) and then ranked for the Czech Republic regarding prediction accuracy; the coefficients of determination were 0.411 and 0.249, respectively, which was in accordance with its dominant appearance in previous studies indexed in the WoSCC. Additionally, the K-Nearest Neighbors and Gradient Boosting Machine regression algorithms indicated, relative to their frequency in studies indexed in the WoSCC, that they are underrated and should be more frequently considered in future digital SOC studies. Future studies should consider study areas not strictly related to human-made administrative borders, as well as more interpretable machine learning and ensemble machine learning approaches.
Read full abstract