The estimation of the level of the soil organic carbon (SOC) content plays an important role in assessing the soil health state. Visible and Near Infrared Diffuse Reflectance Spectroscopy (Vis-NIR DRS) is a fast and cheap tool for measuring the SOC. However, when this technology is applied on a larger area, the soil prediction accuracy decreases due to the heterogeneity of the samples. In this paper, we first investigate the global model performance in the LUCAS EU-wide topsoil database. Then, different clustering strategies were tested, including the k-means clustering based on the principal component analysis (PCA) and hierarchical clustering, combined with the partial least squares regression (PLSR) models, and a clustering based on a local PLSR approach. The best validation results were obtained for the local PLSR approach with R2 = 0.75, root mean squared error of prediction (RMSEP) = 13.38 g/kg and ratio of performance to interquartile range (RPIQ) = 2.846, but the algorithm running time was 30.05 s. Similar results were obtained for the k-means clustering method with R2 = 0.75, RMSEP = 14.61 g/kg and RPIQ = 2.844, at only 4.52 s. This study demonstrates that the PLSR approach based on k-means clustering is able to achieve similar prediction accuracy as the local PLSR approach, while significantly improving the algorithm speed. This provides the theoretical basis for adapting the spectral soil model to the needs of real-time SOC quantification.
Read full abstract