Chemometric regression models were developed for the quantification of praseodymium (Pr, 0–1000 µg/mL), neodymium (Nd, 0–1000 µg/mL), and nitric acid (HNO3, 0.1–5 M) using spectrophotometry. Designed calibration sets were composed of 20 samples each: 10 model points and 10 lack-of-fit (LOF) points. The D-optimal designs effectively minimized the number of samples required to build models, and each design resulted in similar prediction performance, suggesting that statistical design of experiments can provide a reliable framework for selecting training set samples in three-variable systems. Partial least squares regression (PLSR) models were validated against a one-factor-at-a-time validation set composed of 125 samples (three variables, five levels). The top PLS-1 models resulted in average percent root mean square error of prediction error values of 3.5%, 1.7%, and 1.2% for Pr(III), Nd(III), and HNO3, respectively. Power set augmentations of the model and LOF samples were investigated to optimize the number of training set samples. PLSR models built using just required model points (10) had similar predictive capabilities as models including the LOF points (20) but with fewer samples. The number of validation samples was also varied systematically to learn how many samples are needed to validate regression models. This work addresses long-standing questions in the field of chemometrics to help make this approach amenable to the near-real-time quantification of hazardous species in remote settings.
Read full abstract