In the last few decades, spectroscopic techniques such as near-infrared (NIR) and UV/vis spectroscopies have gained wide applications. As a result, various soft sensors have been developed to predict sample properties from its spectroscopic readings. Because the readings at different wavelengths are highly correlated, it has been shown that variable selection could significantly improve a soft sensor’s prediction performance and reduce the model complexity. Currently, almost all variable selection methods focus on how to select the variables (i.e., wavelengths or wavelength segments) that are strongly correlated with the dependent variable to improve the prediction performance. Although many successful applications have been reported, such variable selection methods do have their limitations, such as high sensitivity to the choice of training data, and deteriorated performance when testing on new samples. One possible reason is the removal of useful wavelengths or segments of wavelengths during the calibration process, which could be “tilted” to overfit or capture the noise or unknown disturbances contained in the calibration data. As a result, the model prediction performance may deteriorate significantly when the model is extrapolated or applied to new samples. To address this limitation, we propose a feature-based soft sensor approach utilizing statistics pattern analysis (SPA). Instead of selecting certain wavelengths or wavelength segments, the SPA-based method considers the whole spectrum which is divided into segments, and extracts different features over each spectrum segment to build the soft sensor. In other words, the SPA model contains the complete information from the full spectrum without any selection or removal, which we believe is the main reason for the high robustness of the SPA-based method. In addition, we propose a Monte Carlo validation and testing (MCVT) procedure and three MCVT-based performance indices for consistent and fair comparison of different soft sensor methods across different datasets. The MCVT procedure and indices are generally applicable for model comparison in other applications. Four case studies are presented to demonstrate the performance of the feature-based soft sensor and to compare it with a full partial least squares (PLS), a least absolute shrinkage and selection operator (Lasso), and a synergy interval PLS (SiPLS) based models following the proposed MCVT procedure. In addition, we examine the potential of kernel PLS (KPLS) based soft sensor approaches, examine their performances, and discuss their pros and cons.
Read full abstract