Abstract

In this study a systematic comparison was carried out to assess differences on the accuracy between partial least squares (PLS) and support vector machine (SVM) regression algorithms in soil organic matter and particle size determinations using vis-NIR spectroscopy. The comparison consisted in investigating the influence on the size of calibration set on the external validation set accuracy. For this purpose, three vis-NIR soil libraries containing 14,212, 15,330 and 42,471 soil samples were used to determine sand, clay, and SOM content, respectively. To increase the variability of the results obtained, each calibration subset was randomly generated 49 times and for each iteration a PLS, SVM-Linear and SVM-RBF (radial basis function) regression models were built. These calibration subsets were composed by 250, 1000, 2000, 5000 and 8000 or 10,000 samples.In all situations the SVM-Linear obtained the worst accuracy results. For sand and clay determinations, SVM-RBF models shows a significant improvement on the accuracy, compared to PLS, when the calibration model was built using at least 1000 samples, resulting in a reduction of ~14–29% on the RMSEP. For SOM determinations the difference in RMSEP values of SVM-RBF and PLS starts to be significant when 2000 or more samples were used in calibration set, presenting a reduction of ~8–22% on the RMSEP values. In addition, for all soil attributes investigated between 20 and 27% of the external validation set (1173–2241 samples) were considered outliers and excluded by the PLS regression models.This loss of PLS performance for large calibration sets, indicates the correlation between the vis-NIR spectra and clay, sand and SOM contents tends to be more complex by increasing the variability/number of samples. Requiring the use of machine learnings models with high generalization capacity, such as the SVM-RBF, which increased the performance as the number of samples that compose the calibration set increased.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call