Prediction of various soil properties for a national spatial dataset of Scottish soils based on four different chemometric approaches: A comparison of near infrared and mid-infrared spectroscopy

R.K Haghi,E Pérez-Fernández,A.H.J Robertson

doi:10.1016/j.geoderma.2021.115071

Abstract

Infrared spectroscopic techniques, in combination with chemometric approaches, have been widely used to estimate different physical and chemical properties in soil samples. This study aims to assess the performance of diffuse reflectance spectroscopy in the near-infrared (NIR) region and Attenuated Total Reflection Fourier Transform infrared spectroscopy (ATR-FTIR) in the mid-infrared (MIR) region to predict nine different soil properties: total carbon, total nitrogen, bulk density, clay, sand, silt, pH (in H2O), exchangeable Mg and exchangeable K using chemometric approaches. The predictive performance of four different regression methods i.e., Partial least square (PLS), support vector regression (SVR), Cubist and convolutional neural network (CNN) in combination with different pre-processing approaches were investigated. For CNN, the FTIR/NIR spectra were converted to spectrograms and fed into the CNN to examine the prediction accuracy of two-dimensional convolutional neural network (CNN-2D) and to compare its performance with one-dimensional convolutional neural networks (CNN-1D, spectral data as input). To achieve these objectives, we used a spectral library of 650 samples from National Soils Inventory of Scotland (NSIS) dataset collected in a 20 km grid throughout Scotland between 2007 and 2009. The FTIR and NIR data were both split into calibration (nc = 520) and validation (nv = 130) sets. Our results show that the regression models with FTIR data have better predictive performance than those created using NIR data for all the studied soil properties (improvement in root mean square error of prediction (RMSEP) of 3–61%), except for pH. Comparing the different chemometric approaches, for both the NIR and FTIR, the results indicated that the CNN-1D models performs better than PLS, SVR and CNN-2D for all the studied soil components in terms of RMSEP. We found that the CNN-1D models created using the NIR spectral dataset were superior (lower RMSEP values) to those developed using the Cubist approach for total carbon, total nitrogen, clay, silt, pH and exchangeable K, whereas the Cubist models for sand and exchangeable Mg performed slightly better than CNN-1D. For the FTIR data, Cubist models for total carbon and silt performed best (with lowest RMSEP and highest Residual Prediction Deviation (RPD) values), while for the rest of the components, CNN-1D outperformed the PLS, SVR, Cubist and CNN-2D. We also calculated the impact of the NIR and FTIR variables used by the Cubist and CNN-1D models in predicting different soil properties. We used the Shapley Additive Explanation (SHAP) values, a game theoretic approach, to interpret the output of the CNN-1D models.

Full Text