Abstract

Laser-induced breakdown spectroscopy (LIBS) data acquired from 2959 geochemical standards allow the effects of training set size on LIBS accuracy in geochemical analyses to be evaluated. In addition, LIBS prediction accuracies are quantified for 65 elements based on a typical benchtop instrument. Analyses used two equivalent randomly selected subsets of the full data set to compare prediction accuracies of partial least squares models using 75, 50, 25, 10, 5, 2.5, 1, and 0.5% of the total data set for training and the remainder for testing. The number of components, a measure of complexity, in the PLS models was shown to increase with the size of the training set. Based on root mean square errors on unseen test data, our results show that the larger the training set, the better (lower) the prediction accuracy will be on unseen data. Calibration (training set) size was shown to have a first-order effect on prediction accuracy relative to spectral resolution and detector sensitivity. Different methods of assessing model accuracy using root mean square error (RMSE) are compared, including the error of the calibration (RMSE-C), the error of cross-validation (RMSE-CV), and the error of prediction (RMSE-P). Use of RMSE-C is inappropriate because the samples being predicted are those on which the model was trained. In data sets that are sufficiently large, use of test data (RMSE-P) provides the best measure of prediction accuracy, while RMSE-CV is useful only to provide an estimate of subsequent model performance. Increasing the number of cross-validation folds for our large dataset yields surprisingly comparable RMSE-CV values for models with five or more (up to 100) folds, but this result is likely not applicable to smaller data sets and needs further evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call