To improve prediction performance and reduce artifacts in Raman spectra, we developed an eXtreme Gradient Boosting (XGBoost) preprocessing method to preprocess the Raman spectra of glucose, glycerol and ethanol mixtures. To ensure the robustness and reliability of the XGBoost preprocessing method, an X-LR model (which combined XGBoost preprocessing and a linear regression (LR) model) and a X-MLP model (which combined XGBoost preprocessing and a multilayer perceptron (MLP) model) were developed. These two models were used to quantitatively analyze the concentrations of glucose, glycerol and ethanol in the Raman spectra of mixed solutions. The proportion map of hyperparameters was firstly used to narrow down the search range of hyperparameters in the X-LR and the X-MLP models. Then the correlation coefficients (R2), root mean square of calibration (RMSEC), and root mean square error of prediction (RMSEP) were used to evaluate the models’ performance. Experimental results indicated that the XGBoost preprocessing method achieved higher accuracy and generalization capability, with better performance than those of other preprocessing methods for both LR and MLP models.
Read full abstract