Abstract

We investigated the effect of both the calibration set size (number of samples) and the calibration sampling strategy on the performance of vis–NIR models to predict clay content and exchangeable Ca (Ca++). We evaluated the following calibration sampling algorithms: Kenard–Stone (KSS), conditioned Latin hypercube (cLHS) and fuzzy c-means (FCMS), which are commonly used in spectroscopy and digital soil mapping. These algorithms were tested separately using a field-scale dataset and a regional scale dataset. For each dataset we randomly selected a validation subset and the remaining samples were used as candidates for calibration sampling. The accuracy of vis–NIR models of clay content and Ca++ were compared on the basis of the sampling algorithms used for selecting the calibration samples. We also tested 38 different calibration set sizes varying from 10 to 380 samples. The vis–NIR models were calibrated by using the support vector regression machine (SVM) algorithm. The training root mean square error (RMSE), the normalized RMSE and the prediction RMSE were used to evaluate the sensitivity of the models to both the sampling algorithm and the calibration set size. In addition, we investigated the sample representativeness of each algorithm and we suggest a novel and simple methodology to identify an adequate calibration set size based only on the vis–NIR data (i.e. without prior knowledge of the response variables).As expected, our results show that the error of the soil vis–NIR models depends on the calibration set size. When the number of calibration samples is relatively small the sampling algorithm may play an important role on the accuracy of the vis–NIR models. On the other hand, if the calibration set size is large enough, the sampling method is not a critical issue. Concerning the sample representativeness, we found for all the algorithms that the original distribution of the vis–NIR data can be better replicated by increasing the calibration set size. The results indicate that the calibration samples selected by the cLHS and by the FCMS algorithms better replicate the original vis–NIR distribution of all the samples, in comparison to those samples selected by the KSS algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call