The research octane number (RON) has guiding significance for evaluating the quality of gasoline, while near-infrared (NIR) spectroscopy analysis technology provides an important means for the detection of RON non-destructively and rapidly. When using a near-infrared spectrometer to obtain the RON of gasoline, if the analysis model can be shared among different instruments, it will greatly reduce the cost of re-modeling or model maintenance. Aiming to achieve the sharing of a NIR spectroscopy analysis model for RON between two portable near-infrared spectrometers of the same model, two ensemble learning algorithms, random forest (RF) and extreme gradient boosting (XGBoost), were employed for investigation, as well as two other machine learning algorithms, support vector regression (SVR) and decision tree (DT). Based on the RON of 120 gasoline samples and their NIR spectroscopy collected on the two spectrometers, hybrid and pure models were established to evaluate their sharing performance among SVR, DT, RF and XGBoost. In order to further simplify the model and improve its robustness and prediction accuracy, the characteristic wavelength selection strategies, including elimination of uninformative variables (UVE), successive projections algorithm (SPA), and competitive adaptive reweighted sampling (CARS), were also adopted to optimize the model. The results showed that the hybrid model based on the CARS-RF method yielded the best prediction performance, with the coefficient of determination (R2) of 0.96, 0.86, and 0.94 for the single prediction sets of instrument A, instrument B, and the hybrid prediction set of the two instruments, respectively. Therefore, the hybrid modeling method based on ensemble learning algorithms combined with an appropriate wavelength selection strategy can effectively improve the robustness and universality of the model, and achieve the sharing of gasoline RON models on two near-infrared spectrometers of the same model.
Read full abstract